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This paper describes fhe implementation of a compiler for the programming language CL The compiter he« 
been designed to be capable of producir* assembly-language code for most regtster-orieftted machines 
with only minor reeoding. Moat of the macttne-dependent information used In coda generation is 
contained in a set of tables which are corrstroct»d auteiaatkaB/ from a iMthio a d e au l pt wi n provided by 
the imptementer. In the mediine description, tbe^ 

machine-dependent abstract machine for which the code generator produces intermediate eodo. The 
abstract machine is abstratt in that ft is a C machine: its registers and memory ire defined in terms of 
primitive C data types and its tmtrucfions perform basic C operations. The abstract mecWm is machine- 
dependent in that there is a; close correspondence between the registers of the abstract machine and 
those of the target machine, and betweiert th* behavior of tba abstract machine instructions end the 
corresponding target machine instructions or instruction sequences. The tmptementor defines the 
translation from an abstract ma^hiria prof rem to a target machine program by providing in the machine 
description a set of simple macro definitions for the abstract machine instructions. In addition, macro 
definitions may be provided th the form of C routines where atfatitkm^ proc^ing capaWWy is n ee d a d. 
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1. Introduction 

This paper describes the implementation of a compiler for the programming language C [1,2], w 
implementation language developed at Bell Laboratories and a descendant of the language BCPL [3} The 
compiler has been designed to be capable of producing assembly-language code for most register- 
oriented machines with only minor recoding. Versions of the compiler exist for the Honeywell HIS-6O0O 
and Digital Equipment Corporation PDP-10 computers. 

C is a procedure-oriented language. It has four primitive data types (integers, characters, and single - 
and double-precision floating-point), four data type constructors (pointers, arrays, functions, and records), 
and a small but convenient set of control structures which encourage goto-less programming. An 
important characteristic of C is the minimal run-time support needed Although C supports recursive 
procedures, C does not have built-in functions, I/O statements, block structure, string operations, dynamic 
arrays, dynamic storage allocation, or run-time type checking. The only run-time data structure is the 
stack of procedure activation records. Of course, to run any useful programs, an interface to the 
operating system is required, and a standard set of I/O routines has been defined m order to encourage 
portability. But the implementation of these routines is optional and separate from the task of 
implementing a C compiler which produces code for a given machine. 

The compiler described in this paper was designed to be portable, that is, to be capable of generating 
code for many target machines with a minimum of recoding. When considering portability, three classes of 
machines can be defined: 

1. Machines which can support C programs reasonably efficiently: This class of machines depends only 
upon one's interpretation of the term "reasonably efficiently* Clearly, alt real machines can run C 
programs, limited only by some size constraint related to the availability of memory. However, the 
following capabilities are desirable: (1) the ability to access the current procedure activation record 
and the current argument list in a reentrant manner - this will require one or two base/index 
registers depending upon the calling sequence, (2) the ability to reference via a pointer variable - 
this will require another base/index register or an indirection facility, (3) character addressing, (4) 
integer arithmetic, and (5) floating-point arithmetic. Not all of the above capabilities need be present 
in the target machine; however, the more that are missing, the more interpretive becomes the 
execution of a C program. For example, the HIS-6000 is word-addressed; thus references to 
character variables are interpreted by a small run-time subroutine. 

2. Machines for which the compiler can produce reasonably efficient code: This class of machines is 
clearly a subset of the first class; the size of the subset is again determined by one's definition of 
reasonable. The better the correspondence between the target machine and the machine model 
implicit in the compiler, the better will be the object code produced. On the other hand, if the 
correspondence is poor, the compiler may be able to produce only threaded code or instructions to 
be interpreted by software. 

3. Machines which can support the compiler itself: Because the compiler is written in C, one may think 
that this class of machines is identical to the second class of machines; however, there are added 
restrictions which must be made in order to run the compiler on a given machine: the word size of 
the machine must be sufficient to hold all values used by the compiler; any implementation restriction 
on the size of procedures or data areas (as would be likely on the IBM S/360 because of addressing 
deficiencies) must not be such as to prohibit the proper execution of the compiler {this includes the 
ability of the compiler to compile itself). In addition, there are operating system and configuration 
restrictions: the memory size available to a program must be sufficient to hold the phases of the 
compiler; file space for the source of the compiler must be available and affordable; the I/O routines 
used by the compiler must be implemented. This class of machines is not a subset of the second class 
of machines since the compiler does not use all of the features of the language, notably floating-point. 

This paper concentrates on the second class of machines, those for which the compiler can produce 
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reasonably efficient cods, given the restriction* of the first elm of machines, those which can support C 
programs reaeoneWy erMctontty. Thus, throughout this paper, h* tor* "iisaehm* awtoMiieiiits" wHt 
generally rotor to tr*> obflty of e i sisp ll oi to a r seais w ot for many methane. : ■■ : : 

1.1 Motivation 

One of the serious problems in the field of software engineering is the difficulty of transferring programs 
to new machines, TNs is caused in largo part by the proiifarstion of oStoront pro a ya se m e ig tawguagas 
and machines .and. the significant effort re q uired to i m p I s w a nt a s e m p H a y for ewy partk efe r progrsimwtog 
language and target machine. One apeeoach to seMwg leia prcMaw ia to reotrM pi ogt e to H itof lenguegii 
to a few standardised la ngua g e s which ana then i m pls iMnto d on ail torgat m a ch ine s of wito i est , A 
disadvantage of this approach is that it tonflioU with toe ea a i r a b i l ity of hew* many sp e c ialise d 
languages for speciaUaced p rob le ms . Another disadvantage it the fact that corrifr^ proyeea is being 
made in the development of progr a mm ing languag e s so that by the time » ta^uege ia stsnuer dUed and 
widely available, it is akaady "sbsototo.- It is atee tfff itutt to a o>»» » icia a MW Ii tj ame<ig the various 
implementations of a starrfardtted language. Even if th» standard language to s»ott dot ismri, it la ds*toi*i 
for compiler writers to restrain themaetvea from ei rtond i ng rt aod tor users to ra^aer th a m seh 'ai from 
using the language extensions. A sterner approach to the problem of program transferability ia to restrict 
the number of target nachinos tor whlsh a e m pi t ai t w»t b»wr*te» by i ea s i ng shot oacb n o w ma chi n e 
bo compatible with a widoly-uaod ernst i n g ma e hina. The sof ling of paogroas m l om yu to i ; a r cto to ctu re 
which would result from this req uirement is as undasirabie as the stifling of progreee in pw«i*mtng 
languages which wouW result from adoption of the previous approach. In addition, tf the r»ew machines 
are . only.' upward compatible with fee oto iw o Mnas , to«« an mtoaa may etot im imiii wtth rogsrd to 
transferring programs from now machines to old ones. 

An alternative approach to those of language restriction and machine compatibility to to dovotop 
techniques that reduce the effort required to write compears tor various combi nati o ns of Isnguegos and 
machines. These techniques may be directed at two sub prO Ble imy that of reputing the eff o rt Jn i oto o d in 
writing one particular campitor and that of reducing the effort tnv otoed in writing * femity of related 
compilers. The de ve l o pm en t of such t s rh nio ^s coutt h^^ to mipte^ program 

transferability, such as making it oaator to imp t a m s n t a now laoguego or teetoag language' msre wtoety 
available. 

An early effort in this direction was an attempt to devise a universal computer-oriented language UNCOL 
[4], which is both lar«uage^rtdependent and r»ecttne^r*lependent, to which ail progr ammin g language* 
could bo translated end which iteetf could be translated with acc e pt a bl e efficiency into any mechine 
language. The idea was that ops need wr its only one UTCOL^ o •mac h i n e longuogo translator for each 
target machine and qna source lar^i*g#-to4JNCa translator lor each source language, rather than 
having to write one compiler for each source langu age m a c hine l a nguag e combination. In addition, if 
UNCOL wore well defined, than the various implementations of UNCOL could be ntede «>mf>a«bto, the*eoy 
insuring the compatibility of the source language implementations. Unfortunately, the concept of a 
universal language has not tod to a practical sototton of tha preeem tr« charederiaMca of eource and 
machine languaga independence are incompatible with the need for ecxie^abV arftetorit trarwtatton from 
UNCOL to machine language. 

More practical techniques for reducing the effort involved in writing compilers result if one considers 
techniques with mora limited goats than those of the UNCOL pro>ect. One approach is to develop 
techniques which raduco the effort involved in writing ona particutor compiler lor soma tong»sgo*mechine 
combinatioa Examples of such techniques era parser gerwrators ar^ syntaK^rectod symbol procoesors 
[5]. Anqlhar approach is to develop t^ciwiques for writing femiHe* of compters for many source 
languages and one target machine. An eHampie of such a technique is a compiler writing system with 
code generation primitives, such as FSL [*]. Thp third approach, and the one which is teken in this worX, 
is that of the portable compiler, e compiler for a particular source languaga which can produce coda for 
many target machines. It should be noted that techmojues such es parser- generators^ whidh can aid in tho 
implementation of a single compiler, can be equally useful in the imptomentation of more general systems 
such as compitor writing systems and portable compilers. 
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1.2 Background 

A compiler can be considered to consist of two logical phases, analysis and generation. The analysis 
phase performs lexical and syntactic analysis of the source program, producing as output some convenient 
internal representation of the program, along with a set of tables containing lexical information and other 
information derived from the declarative statements of the program. The generation phase then 
transforms the internal representation into an object language program, using the information contained in 
the tables produced by the analysis phase. One can confine the machine (object language) dependencies 
of a compiler to the generation phase by a suitable choice of internal representation, i.e. one which is 
machine-independent. On the other hand, it is not practical to also confine the source language 
dependencies of a compiler to the analysis phase since this would make the internal representation a 
universal language. Thus the generation phase of a compiler is both source-language-dependent and 
machine-dependent. 

Most portable compilers require that the generation phase be completely rewritten for each target 
machine [7,8]. This effort may represent only about one-fifth of the effort needed to rewrite the entire 
compiler (8} In the case of the BCPL compiler [9J for example, moving the compiler may require only 
three to four weeks under ideal conditions (but otherwise may require up to five months). However, it 
would be desirable if the amount of recoding necessary to generate code for a new machine could be 
reduced. 

One approach is that advocated by Poole and Waite for writing portable programs [10,11} They 
advocate that before writing a program to solve a particular problem, one define an abstract machine for 
which the program is then written. With this approach, in order to move the program to a new machine, 
one need only implement the abstract machine on the target machine, typically via a macro processor. 
The desired qualities of the abstract machine are that it contain operations and data objects convenient 
for expressing the problem solution, that it be sufficiently close to the target machines of interest so that 
acceptable code can easily be generated, and that the tools for implementing the abstract machine be 
easily obtainable on the target machines. 

This technique can be applied to portable compilers by considering the problem to be the implementation 
of an arbitrary source language program. The operations and data objects convenient for expressing the 
problem solution are then those which are basic to the source language. With this technique, a compiler 
would be broken into two parts: a machine-independent translator from the source language to the 
abstract machine language and a machine-dependent translator from the abstract machine language to the 
target machine language. The translator from the abstract machine language to the target machine 
language should be smaller and simpler than the conventional generation phase would be; typically, it 
consists of a set of macro definitions which map each abstract machine instruction into the corresponding 
target machine instruction or instruction sequence. Moving the compiler to a new machine simply requires 
rewriting the macro definitions. 

The major difficulty with the abstract machine approach to portable software is in determining the 
appropriate abstract machine. If the abstract machine is of a high level (i.e , very problem-oriented), then 
the program will be very portable but the implementation of the abstract machine will be difficult On the 
other hand, if the abstract machine is of a low level (Le., more machine-oriented), then, unless it 
corresponds closely to the target machine, either the code produced will be inefficient or the 
implementation will be complicated by optimization code. 

The solution to this difficulty proposed by Poole and Waite is to define a hierarchy of abstract machines, 
ranging from a high-level problem-oriented abstract machine to a low-level, machine-oriented, and easy- 
to-implement abstract machine. In this solution, the higher-level abstract machines are implemented in 
terms of the lower-level abstract machines, and only the lowest-level abstract machine need be 
implemented on a target machine in order to transfer the program; once it Is transferred, higher-level 
abstract machines may be implemented directly in terms of the target machine in order to improve 
efficiency. While this technique may be useful for transferring particular programs, it is unlikely that it 
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will be acceptable in practical terms aa a compilation technique because of the need for additional 
translation steps. An experiment by Brown indicates that one may i mplem ent end then ootiaaise » 
low-level abstract machine in about the same time at it ta*ea> to j mp tp m a nt a. Wej h er - Jew el ooetroct 
mechine end that the! re«u)ting impiomentstioni are siimVV efficient. ■ Thwa an alty eettMe e eH rt io n is to 
use a low-level abstract mashine, tatajfee- the i iep lamertey te op tk ai a i m dae *- oo V We aee at iof i le more 
likely to be acceptable as a completion tocfrwqua. A thirtl sekrt^ be aerate d in trea a^per. 

The technique of rewriting the generation -..pbaae requires that a nen-triv»al tranalator from the internet 
representation to the tergef, machine ^^taiiBMa»4i-b%wffttee.. for eeejs new target machine. Sim it or l y , the 
abstract machine approach require* tM oJtr « '»» rt o r . lroaj c tte> eeetreet etechm Jenguage to the terejet 
mechine. language 'be written for each iiew target iMchinei if r a a s oeebiy a ffk ia nt cede is de airod end the 
abstract mechine does not correspond very closely to the target machine, than this trimlafcer w4H etao be 
non-trivial. 

A more desirable goal for a portable compiler ia that it nave a generation phase which can be modified to 
produce code for a new target machine by a process wt^ 4s iarg^y aromatic. ImpHeM in thiejoel ie 
the requirement that the modification process obtain ifai^ 

procedural) description of the machine. An eerty effort in thi» mrechon we» the StAeJG ayaeam [131 
which attacked the problem Of describing e macNne-dependent proceas (cede generation) in a mechine- 
independent way. In the SLANG system, source language constructs are translated into a set of besic 
operations celled EMILsj the EMei* are translated into abs oUrta awcbtae coee ueing-meem defi n iti ons end 
instruction format, defimtions. The apftroe^ 

can be considered to be the instruct io ns of an a ba lr ac t m ea fc i n e t the d iff e r ee ca 4s thet fee code 
generation algorithm use* mtformetion contained m e maahiee eeecri«4»ej» in order *e teitor the EMJL 
progrem to the target mechine. The EWU differ from the i n s truct i on s of e Foota oiid Vaoito abstract 
machine in that they ere mechim>'<>r1ented t rather then preM am (eour« la n g ^ a) er%nted. fen eddition, 
the code generator does not seem to toow about rcgieier* other than ieeaw reg i s to re, which i mp h o* that 
one will hot be able to achieve the desired dose correspondence between the eeelract 
register-oriented machines. Nevertheless, the method of describing the instructions of a machine by 
providing simple.,, instruction sequences wttch interpret the a ba tra c j me ch ine mstrwti oa* ooem* to be a 
good compromise between the desire to mimwwe coding end the difficulty of m ath e mat i c a ll y defining ♦ 
machine arxl utilizing such a def^ 

More recently, Miller [14] has eiqjlored the problem of constructing a code generator from e mechine 
description. Mflter propose* theta generation phase be constructed in two steps. k» the first step, the 
language designer specifies the language-ciependent part of the generation phase by writing e eel of 
procedure) machine-independent macro definitions for the operations -,0ft the internal representation 
produced by the analysis phase. These macro definitions define the operations of the internal 
representation, such as addition, in terms of machifw-mdapendent (i^, ianguage-oraanied) primitives, such 
es integer addition, which ere created by the language designer. In the second step, the impiementer 
provides a description of the Urge* machine which is used by an automatic code giwo e arofrn eyettm 
named DMACS (Descriptive Macro System) in order to fill out the macro definition* of the firat step end 
thereby produce a code generator for the target mechine. As was the case with the SUNG system, the 
□MACS mechine clescription defines the prMUv* cede s e qu e nc es 

which Interpret them. In addition, however, the permuted tocatioos of ttm operends (in terms of their 
being in memory or in particular registers) are specif jed as are the cor respoedng reawH bKaiiOfi*. Thye 
the primitives can be made to correepond very closely to the instructions of the target machine so that 
the code sequences in the machine description are simpler and the resulting objed code is more efficient. 

Both the SLANG system end DMACS ere intended to be general in that they ere not designed for a 
specific source language. However, true gene^y is difficult to obtain end the systems do reflect 
preconceived notion* about source languages. believed that, since there ere much more significent 
veriatlorw among language* than aa*mg.i»echine«, » practical i m plame M a t io n of a compiler for eny 

■ w^^SI [ mm *«|jp»jfey^ 11* «***•» 

recognized to some extent m OMACS where the primitives are created by the languege designer ea 
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convenient for expressing the operations of the source language. On the other hand, DMACS contains no 
notion of storage classes (different mechanisms for accessing variables of the same data type) which are 
needed for Q the implementation of storage classes is machine-dependent end thus must be defined in 
me descr, P tioa In this P«Per, techniques similar to those used in the SLANG system and in 
DMACS are used in the implementation of a portable C compiler. 

1.3 Method 

The goal of this research is to design a generation phase for a C compiler which can be modified to 
produce code for many machines by a process which is largely automatic. Some insight into this problem 
can be gained by examining the corresponding, but better understood problem of the automatic 
construction of an analysis phase. One common approach is the use of a parser generator [15J. A parser 
generator is a program which accepts as input a grammar for a source language and produces as output a 
set of tables which are used by a language-independent parsing algorithm. The parsing algorithm is 
supplemented by a set of action routines which are provided by the implementer; these action routines 
are called by the parsing algorithm at appropriate points to produce the output of the analysis phase. 
The important characteristics of this process are as follows: 

1. The analysis phase is divided into two parts, a language-independent part (the parsing algorithm) and 
a language-dependent part (the parsing tables and the action routines). 

2. The language-dependent tables are constructed automatically from a finite description of the language 
(the grammar). 

3. The analysis phase is "filled-in" by the implementer by providing information in a procedural form (the 
action routines). 

4. The choice of a specific parsing algorithm determines the class of languages which can be handled by 
the analysis phase. 

The process of constructing an analysis phase can be made more automatic through the use of a compiler 
writing system. In a compiler writing system, the action routines are in a sense built-inj the implementer 
invokes these action routines from a higher-level description of the translation. The use of such a system 
may involve much less effort than would be required to write a complete set of action routines. However, 
the important point here is that the use of built-in knowledge, as opposed to allowing the addition of 
arbitrary procedural knowledge, restricts the class of translations (and thus source languages) which can 
be handled by the automatically generated analysis phase. 

For the compiler described in this paper, techniques analogous to those described in the preceding 
paragraph are used in the implementation of the generation phase. The generation phase is split into two 
parts, a machine-independent part and a machine-dependent part. The machine-independent part of the 
generation phase is a machine-independent code generation algorithm, corresponding to the language- 
independent parsing algorithm of the analysis phase. Just as the choice of a particular parsing algorithm 
imits the class of languages that the analysis phase can handle (the parsing algorithm is not completely 
language-independent), the choice of a particular code generation algorithm determines the class of 
machines for which the compiler can produce reasonable (non-interpretive) code. The machine-dependent 
part of the generation phase consists of a set of tables produced automatically by a stand-alone program 
(Generate Tables) from a machine description, which corresponds to the grammar in the construction of 
an analysis phase. The information contained in the machine description may be supplemented by a set of 
routines which correspond to the action routines of the analysis phase. However, the compiler described 
in this paper is closer to the compiler writing system approach in that implementer-supplied routines form 
only a minor part of the generation phase. The extent to which the implementer can easily and safely 
include such routines in the generation phase represents another factor determining the class of target 
machines handled. 
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A code generation algorithm, if it is to ba machine-indepandant, requires a modal of a machine with which 
to work. This model may express such notions as memory, registers, eddressing, operations, end 
hardware data types* In the machine c^cripliqfv the ^rt^qifl^f defii^ li^ t«rv»t pMic^ifK In terms of 
this model and also specifies the form of the ob^ct lang^^ 

generator can produce acceptable code directly corresponds to the generality of the mechine model. 

The mechine model used by the C compiler is a C machine: a machine whose registers end memory ere 
described in terms of the primitive C data types and whose operations are primitive G op e r a tio ns. The 
implementer models the target machine in terms of e G mactw^ prodMsU^ an a hs tre c t m e ch i n a . The 
abstract machine may bo very similar to or very different from the target machwe» de »ee dtr ^ ypon how 
closely the target machine flt$ ti* mach^ 

model, produces code for the Street rnacWne. The "assembly" language el the abstract mechine is called 
the intermediate language; an intermediate language program, which is in the form of a series of mecro 
calls, is translated into the target ma^ne assembly language using • set ol rnacro definitions, provided by 
the implementer in the machine dtscriptioa Assembly J*ngu«g* was chosen over mechlm language for 
the output of the compiler because it is far easier to describe and produce In a machmo^ ndepondon t 
manner than machine code or object modules. 

The abstract C machine plays the same role in the C compiler as would a Poole and Waite ebstract 
machine. The difference is that instead of there being one fixed abstract machine, there is a class of 
abstract machines, corresponding to the variability ifi the machine model This variaWity eUows the 
implementer to define a particular abstract machine >i^h more closely resembles his tercet mechine. 
The result is that the translation from the abstract mechine language to the target mechine languege 
becomes simpler, and mort efficient code is prodjced 

The process of modeling the target machine is described in chapter two. A detailed discussion of the 
code generation algc^ ithm is yeaented in ch^ter three- Co ndusiom presented in chapter four. 



- 11 - 



2. Modeling the Target Machine 

The code generator's model of • machine is an abstract C maeWr*. • macWrw who«» irwtructlorm pert om 
the primitive operations of the C language. The data types Of the sbstrsct mtcWne «r# th© prlnwtiv© C 
data types Characters, integer*, and single- and double-precision floetir* point), supplemented by one or 
more pointer Otsses which are distinguished by their ability to reserve addresses. The basic eddressebte 
unit of the abstract machine memory is the byte, which holds a single character value (characters are the 
smabeat C data type). Vahm ef the other aes*^^ jhtefref number of 

bytes, possibly aligned in terser unit* of memory. The abstrotr amJUne'-h ae a Set of registers which may 
be used to hot* the operands ef the abstract mseNna Insrrwttens. £** a*etrect mechine register is 
capable of holding values of some subset of the abstract metMne dote t»pe*. The Instructions of the 
abstract machine are three-address instructions. -Each address msy »p e t l *y a^absh-ect mecNne register 
or a location in -memory) the mechanisms for ref er en c ing a ismwi / tecettew correspend to the primitive 
addressing modes in C. 

In the machine description, the implementer describes the target machine in terms of this machine model 
by defining a particular abstract machine for wWch the coee f^ 

implementer specifies the sizes and alignments of the primitive Cdeta types and defines pointer classes 
as convenient. The iimiiemeeter eehw^ generally correspond to 

those registers of the target machine which are to be used In th» evskietton Of expressions. The 
implementer also specifies the registers which may hoid veluae «t ea*h of the ebstract machine data 
types. In addition, the implementer may specify that any two abstract machine regWers conflict in the 
terget machine, meaning that only one may hold a value at a ny on e t bw o> * Tt» li w p l s m sii t er define* the 
abstract machine instructions in terms of their operand/resuH locations and possible S*de*«f f ects on other 
registers. In addition, the im asa m o nt o r provides a set of mecre eefew ^ n a wtech iiwptee^t the abstract 
machine instructions on the target mech i n s. 

2.1 The Intermediate Language 

The intermediate language is the assembly language of the abstract machine. Using the information 
contained in the tables construcled at rem tlw tnacr^ tlasuM WW H the code generator produces a 
translation ef the source program in the in te rm edi a te Isi m oOja j intermediate tanguage program 
consists of a sequence of macro catts, each of which is s ira s eJH Mtfr eee or more object language 
stetements using the mecro definitions provided in the mochino de scripti o n . There ere two types of 
macros in the intermediate lenguage! The first \yp» *• ******* r** * *Hi the three-address 
ebatrect machine instructions. The second type ■■W#-- W m* * * *^*9mr'' m'mmm' ' a*" *m*»T : 
assembly-language pseudo-operations or instructions i m p lem e nt ing the prieityive C control structures. 

2.1.1 Abstract Maehine Inartmotiona 

The abstract machine instructions are three-address inst ru c t tew s * which perform the evaluation of C 
expressions. The operators of the abstract methine inst iuctlen a are cs^ ebstrsct machine operators 
(AMOPs), the eddresses are eatled reference* fWs). 

2.1.1.1 AMOPs 

AMOPs ere basic C operations which are qualified by the seectfic ebatrect machine data types of their 
operands. For example, in the WS-eOOO implemen t a ti on there are fbor < Oi Te s| >ondMg to the C 
operator V: 

integar addition 

doubte-preeiaien floating-point addition 
addition of an integer to a pointer to a byte-aligned object 
eddition of an integer to a pointer to a word-aligned deject 



4d 

+po 

♦pi 
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In addition, there are AMOPs tor data movement, data type convention, and conditional jumps. AMOPs are 
represented in the compiler as an integer opcode with a vetue from 0 to 2S6. The various AMQPs ere 

listed in Appendix IL 

2.1.1.2 REFs 

A REF is a C-oriented description of the location of an operand or the result of an abstract machine 
instruction. A REF may specify either a reenter of the abstract mechina or a tocation in iwomo ry; the 
possible classes of memory references ieciuee C variables of verious etorege ciseses (eutojeasic, static, 
external, parameter, temporary) as weft m> constentt ead indirect reier a ec e e, A REF is repres e nt ed by a 
pair of integers ceHed REF.BASE and REFIFFSCT; REF-BASE eotarminsa otthor a porticutor raajtatar or a 
particular class of memory references, WflFFSET li M a r miea a the a aoU tocot i on gh>o n o opoctfic memory 
reference class. The possible values of REF.BASE are Mated Mew with their interpretations tactual 
integer values ere shown for concretenessi the compiief itseif uses m«iife«t constants): 

REF-BASE Merpretetien 



n at 0 - register en (register timbers are assigned to the refistsrs of the abstract 

machine in esvedictabie manner by GT) 
-1 - en automate or temporary variable; OFFSET is the offset of the variable in the 

■ ■' stack frame 

-2 - an external variable, referenced by name; OFFSET is en internal identifier 

number ; ■ 

-3 -a static (internet) variable} OFFSET » an internet static vari ab l e number 

-4 - a parameter; OFFSET is the offset of the viable or its address in the 

argument Kst 

-5 - a label; OFFSET is an internal label number 

-6 - an integer constant whose value is OFFSET 

-7 - a f toating-point constant; OFFSET is «i mternai constant number 

-8 - a character string ooweto n tt .0FFSEf is en internof streig ma»bor 

ns-9 - r*fo»*iw*in»tti^ 

of the refe r e nc e nttative le the pointer 



The specific values of REF*BASE need not be referred to in most macro definitions; the exception is the 
NAME macro, which converts a REF into a symbolic address. 

The representation of a three-address instruction in the intermediate language is that of a macro call with 
five or seven integer arguments representing the AMOP end REFs for the result and the ope r a n d s of the 
AMOR (Each REF consists of two arguments, REF.BASE end REF.OFFSETj only two REFs are provided in 
' the case of a .unary Al^) The macmnae»««Kl in the mewo ca« is ^ 

entry : in * table produced from the mechiM <a»>crtption by the te bta e ntr y refers to the , 

representation of the corresponding metro ciefinHton from the i*a«him oe^riph^ 

2.1.2 Keyword Macros 

Keyword macros are those macro calls which, along with the three-address instructions, make up an 
intermediate language program. Unlike AMOP macros whose names are generated by GT, the names of 
the keyword macros are predefined, as are their functions. For example, keyword macros are used to 
define external variable names end internal labels; -to specify initial values in storage, end to produce the 
function prologs and epilogs. The various keyword m*crc» ciefir^ in the intermed^Jane>4atje are listed 
below along with a brief description of their functions; a mere com p lete sot of d e s s r ip tt o n a eppeers in 
Appendix ItL ■ 
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macro 



function 



HEAD produce header statements, if needed 

ENTRY eWh»*n*ei^ 

EXTRN define an erternel reference 

INT define an integer constant 

CHAR de*f» • character constant 

FLOAT define a f loating-pomt constant 

NFLOAT def in* a negative floating-point constant 

DOUBLE define a douWe-precision float constant 

NOOUBLE define a nog ative double-precision conatant 

ADCONn <ie«ne a elm V printer constant 

STRCON define • {Mmter referenctnf a string conatant 

EQU define a symbol 

ZERO d*ftne *n eree of atorafe initialized to zero 

STATIC define astatic variable 

STRING define the string constants 

ALIGN forco en attgnmeet of the location counter 

LN define a line-number aymbol 

LABCON define a label constant 

LABDEF define an internal label 

IDN translate an internal identifier number 

into the corresponding ai nwfc l ai % y wb el 

END produce an end statement, if needed 

PROLOG prodwe the proteg code of a C function 

EPILOG produce the epHog code of a C function 

CALL produce a function call 

RETURN produce code for a return statement 

GOTO produce a jump to a label expression 

LSWITCH produce e switch Jump (Hat version) 

TSWITCH produce a switce jump (tsbie version) 



The) actual macro names which appear 
names listed above. 



in 



an intermediate language program are abbreviations of the 



2.8 The Maohine Description 

The machine description is a "program" written in a special-purpose language from which is constructed 
the machine-dependent tables of the generation phase. The machine description has two functions: (1) it 
defines the particular abstract machine for which the cmte toi»ra<rti friwJui* toloimodialo tuue, and (2) 
it specifies the translation from an intermediate language program to the corresponding object language 

PfOgf Alt). ' >3C-Ti 

The abstract machine is defined in two sections of the machine description. First, a set of diffriltion 
statements defines the registers and memory of the abstract machine. Second, in the OPLOC section, the 
AMOPe ere defined in terms of thek operand/rwutt kxstfore. The trensla^ frc^ Interm^ate 
language to the object language i« specified by a set of macro cMnfflom in the macro section of the 
machine description. More information on the writing ot*f meehme dea«<(p1len may be found in Appendix 
I; the machine description used in the HIS-6000 implementation is listed in Appendix IV. 



2.2.1 Defining the Abstraot Machine 



In the machine description, the imptementer first defines the registers of the abstract machine. For 
example* .the statement • 
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regnames (x0,xl,x2,x3,x4,a,q,f); 

defines the eight abstract machine registers used in the HIS-6000 implementation. The registers XO 
through X4 correspond to the first five of eight HIS-6000 index registers, the A end Q correspond to the 
accumulators, and the F register is a fictitious ftosting-potnt sttwautetoi which corresponds to the 
combined A, Q, and E {exponent) registers on ^ 
terget machine with the A and Q registers is specified by thestatewant 

conflict (a,f),(q,f); 

The remaining HIS-6000 index registers are not represented in the abstract machine since It was not 
desired thet they be used by the code generator m Itemitotiim **pr9**om^^ of Ifieee registers 
hold "environment pointers," the other is used as a scratch register by some of the macro definitions. 
There is frothing thet requires that the abstract machine »aa>rte i i bo I m p t a mewtod as actuet machine 
registers on the terget machine; they may else be implemented as fhwd me mory tocSttons. 

For convenience, the abstract machine registers can be gathered into dasee* for example, in the WS- 

6000 implementation, the statement 

class x(x0,xl,x2,x3,x4), Ka^h 
defines the class of index registers X and the riass of general registers R. 

The implementer also defines the classes of abstract machine pointers. Pointer classes are necessary on 
machines which are not byte-addressed since pointers to by te aa gii e d cojotts wW be handled differently 
than pointers to word-aligned objects. In the HK-€090 msohme-deeertpHon, the stetement 

pointer pO(l), pl(4)s 

defines the class PO of byte pointers and the class PI of word pointers. The "4* Indicates that the value 
of a PI pointer is always a multiple of four bytes. The feet that there ere four bytes par word on the 
HIS-6000 is specified in the statement 

size l(char), 4(int,flo«t), fttdoubteh 
A similar statement is used to specify the alignment restrictions. 
The statement 

type int(r), char(r), float(f), double(f), pOHplfoh 

defines the registers which can hold values of each of the abstract machine data types. For example, in 
the HIS-6000 implementation, word pointers are held in the index registers X while byte pointers ere held 
in the general registers R. 

The definition of the abstract machine is completed in the OPLOC section of the machine description 
where the implementer specifies tr« btr^k>r of tr« sbstrsct mKt^ c^rstions in terms of their 
operand/result locations. For oxempto, the toceUoe ee lwKi ei i 

+d: f,M,fj 

specifies that the AMOP '+d' (double-precision floating-point addition) can take its first operand in the F 
register and its second operand in any memory location and, under these circumstances, the result is 
pieced in the F register. The construct on the right in the location definition is celled en OPUOft it 
consists of three location axpraasisws, one for the first operand, second operand, and result (reeding from 
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lef t to right). A location expression may specify any set of abstract machine registers or any set of 
memory reference classes; for example, the location expression 

r|x 

represents the set consisting of the general registers R and the index registers X, and the location 
expression 

-intlit 

represents the set consisting of all memory reference classes except that of integer constants. An OPLOC 
may specify that the result is placed in the first or second operand location. For example, the location 
definition 

+i: r f M,l; 

specifies that the AMOP (integer addition) takes its first operand in a general register and its second 
operand in any memory location, and the result is placed in the register which contained the first 
operand. This location definition is equivalent to 

+»: a,M,a; q,M,q; 

which explicitly lists the two alternatives. An OPLOC may also specify that the contents of certain 
registers are destroyed during the execution of an AMOP; for example, the location definition 

*i: qMq [«! 

specifies that an integer multiplication destroys the contents of the A register. 

2.2.2 Defining the Object Language 

The translation from the intermediate language to the object language is specified by a set of macro 
definitions included in the machine description; macro definitions are provided for the abstract machine 
instructions and the keyword macros. The simplest form of a macro definition is a single character string 
which is substituted for the macro call during macro expansion. For example, the macro definition for 
floating-point unary minus used in the HIS-6000 implementation is 

-ud: " FNEG" 

This macro definition specifies that each occurrence of a '-ud' abstract machine instruction is to be 
translated into the assembly language instruction TNEG" which complements the contents of the F 
register. The macro definition for '-ud' is closely related to the location definition for *-ud\ 

-ud: f„l; 

which states that the operand is found in the F register and that the result is placed in the F register. A 
macro definition for an AMOP can assume that the actual operand/result locations appearing in an 
abstract machine instruction satisfy the constraints specified in the corresponding location definition; at 
the same time, a macro definition must produce correct code for all combinations of operand/result 
locations allowed by the location definition. 

A macro definition for an abstract machine instruction can refer to symbolic representations of the 
operation and the operand/result locations by using the character sequences *0 (operation), #F (first 
operand), *S (second operand), and *R (result). These character sequences are abbreviations for calls to 
an implementer-defined macro which converts an AMOP opcode or a REF into the desired object language 
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representation. For ewemple, the macro definitron for (integer edition) N the WS-6000 
implementation is 

+i: " AD*R «$" 

If the first operand location <whteh is also the result location) is the A register end the second operand t* 
an externa) variable "X*, then the code produced by this macro definition is 

ADA X 

which adds the contents of "X" to the A register A macro definition can aiso contain character strings 

whose inclusion in the expansion of a 

result. An example is the WIS-6000 macro def^^ 

«: 

Untlit,): " »FLS %o(**Sr 

(Hntlit,): * LXL5 #$ 

*FL$ 0,5" 

which produces different code sequences depending upon whether or not the second operand (the 
number of bit-positions to shift) is an integer constant. A macro definition may include references to the 
arguments of the macro catl using the character s«^e«es aO, ei t _ e9j a macro tlefin»ion may include 
embedded macro calls, such as the "lefafSl* in the last example, which returns the value of the integer 

constant. 

A macro definition may also be specified in the form of a C routine. C routine macro definitions are used 
when processing is needed which is beyond the capabilities of the simple mKro scheme so far deecribed. 
C routine macro definitions may define global variables, perform arithmetic and logical operations, and 
select code sequences on conditions other than operand location to ^ 

however, C routine macro definitions are unable to interact with the code generation algorithm. In the 
HIS-6000 implementation, C routine macro definitions are used to translate IBEs into GMAP symbols, to 
translate the source language representations of identifiers and floating-point comtants into GMAP, to 
define character string constants, and to buffer characters while definw* storage for variables (GMAP 
does not have a by te location counter, as is assigned in Ml intermediate iai^uageK The C routine macro 
definitions used in the HIS-6000 implementati^ 
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3. Generating Code for an Abstract Machine 

The most interesting part of the compiler is the code generator since, unlike most code generators which 
produce code for a fixed target language, the code generator of the C compiler is designed to produce 
code for a class of abstract machines. 

3.1 Functions of the Code Generator 

The code generation process consists of three fairly distinct functions. First, there is the generation of 
intermediate language statements to define and initialize static data areas and constants. Second, there is 
the translation of source language control structures into labels and branches. Third, there is the 
translation of source language expressions into sequences of abstract machine operations. 

The C compiler is designed to produce assembly language code for conventional machines; thus, the 
intermediate language statements for defining and initializing static data areas directly correspond to 
assembly language statements which define symbols, define constants, and align the location counter. The 
only complication is that the code generator must use the size and alignment information from the machine 
description in order to specify the sizes and alignments of data areas. More information and redundancy 
could be added to the intermediate language in order to accomodate a larger class of target languages; 
see [16] for examples. Another possible improvement would be to emit segment specifying instructions 
so that the output could be segregated into different segments according to whether it is code, pure data, 
impure data, or uninitialized data. 

The process of translating source language control structures into labels and branches is rather 
straightfoward. The only complications come when emitting conditional branches which test the value of 
an expression; these problems are covered in the next section. 

3.2 Generating Code for Expressions 

The generation of code for expressions is the most difficult part of the problem. The code generator 
must generate a correct sequence of abstract machine instructions to carry out the indicated operations. 
The operand and result locations it specifies in the abstract machine instructions must conform to the 
location definitions provided in the machine description. Moreover, the code generator must Keep track of 
the locations of all intermediate results and correctly administer the abstract machine registers and 
temporary locations. 

The generation of code for expressions is performed in two steps, semantic interpretation and code 
generation. 

3.2.1 Semantic Interpretation 

The code generator receives expressions in the form of syntax trees whose interior nodes are source 
language operators and whose leaf nodes are identifiers and constants. Thus, an expression can be 
considered to consist of a "top-level" operator along with zero or more operand expressions. The first 
step in the processing of an expression consists of translating a tree in this form to a more descriptive 
form whose interior nodes are AMOPs. This translation involves checking the data types of operands, 
inserting conversion operators where necessary, and choosing the appropriate AMOPs to express the 
semantics of the source language operators. The selection of an AMOP to replace a source language 
operator is based primarily on the data types of the operands. For example, on this basis, an addition 
operator may be translated into either integer addition, double-precision floating-point addition, or one of 
a number of pointer addition AMOPs. However, it is useful to be able to choose AMOPs also on the basis 
of what is provided in the machine description. The basic idea is that of defaults. If the semantics of a 
particular AMOP can be expressed in terms of a composition of more basic AMOPs, then the AMOP can be 
left undefined in the machine description; the code generator can use the equivalent composition of 
AMOPs instead. The advantage of having optional AMOPs is that the implementer need define one of 
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these optional AMOPs in the mechine description only if his definition will result in sufficiently better code 
then will be produced using the equivstent composition of more bssic AM0P>. 

An example of this technique is the handling of a class of C operators called assignment operators. An 
example of an assignment operator is '-+', where 1 «+ FT is defined to be the same as 1 - L + R* except 
that the expression ! is evaluated onfy once (it may contain side^ffectsX Conskier an expression 
N L -op R." If the corresponding abstract machine assignment operator is defined in the mechine 
description, then the source language esttapent operator is translated into that abstract machine 
operator} otherwise, the expression T ^ IT is converted to the eqirfvalent form 1 - L op R", except 
that there is only one copy of V having two pointers to it {• flag is set in the root node of V so that 
later routines will recognize tWs fact}. There! ioro, a particular abstrad m^ne migno^ operator n^ 
be included in the machine description oniy if the code sequences it generates are better than the code 
that would be generated by the equivalent assignment expression. An example from the WS-6000 
implementation is the abstract metbiiw operator W (irrfe^^ 

into an add-to-storage instruction. The correspond*^ fbatinf *pomt e f frnm a nt operator . >»# is not 
defined in the machine desertion since no fteeTt^ the 
mechine. 

Other examples of optional AMOPs which have been implemented are the pointer comparison operators 
for pointers other than class PO pointers (the defwit is to convert to the V^*^* ^Oflwoir denominator" 
pointer class for which the operation is implemented) end the test for nuH/non*m^l poster operetors <the 
default is to convert the pointer to an integer and test for equality /inequity with OX OH^ promising 
candidates for being optional AMOPs are^t^ 

9.2.2 Code Generation 

The second step in the processing of an expression is the generation of a sequence of abstract machine 
instructions to carry out the evaluation of the expression. Hits code generation is performed by a set of 
recursive routines, some of which will be described in this sectkwv The operation of the code generation 
routines is basically top-down. When a oetl is made to generate code ta^^ set of 

desired locations for the result of that evaluation is also specified This specification, along with other 
available information about the operands of the top-level operator of the expression is used to choose 
one of the OPLOCs from the top-level operator's location definition m the mechine <Jescription <location 
definitions are described in section 2.2.1). From the chosen OPLOC and, possibly, the desired tocat ions for 
the result of the expression are derived sets of desired locations for the operands of the top-level 
operator. Recursive cells are then made to generate code to evaluate the operand into these desired 
locations. Next, an abstract machine instruction is emitted for the top-level operation. FinaHy, if 
necessary, abstract machine instructions are emitted to move the result of the expression to an 
acceptable location. 

3.2.2.1 Specifying Desired Looations 

A set of desired result locations is specified by a structure called a LOG. A LOG structure has two integer 
members, LOC.FLAG and LOCWOWX The possible values of LOCJUG are listed below along with their 
interpretations: 
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LOC.FLAG interpretation 

0 the "result" is the internal label specified by LOC.WORD (used only for 
conditional jump AMOPs) 

1 the result is to be placed in a register; acceptable registers are specified by 
one-bits in LOC.WORD (bit 0 corresponds to register number 0, etc.) 

2 the result is to be placed in memory; acceptable classes of memory references 
are specified by one-bits in LOC.WORD (this field is used only to select registers 
for pointers in indirect references) 

3 the result may be left in any location acceptable for values of the particular 
data type 

Note that a particular memory location is never specified as the desired location for a result; rather, 
classes of possible memory locations are specified. 

For convenience, if the LOC passed to the top-level code generation routine specifies that the result is 
desired in a register, then all registers not capable of containing the particular data type of the 
expression being evaluated (as defined in the TYPE statement of the machine description) are removed 
from the LOC. Similarly, if the LOC specifies memory reference classes, then all indirect classes where the 
pointer register is unable to hold pointers of the corresponding pointer class (as specified by the TYPE 
statement) are removed from the LOC Thus where the code generator simply desires that a value be in a 
register, it may provide a LOC specifying that the result may be left in any register. 

The removal of "impossible" registers from a LOC is not performed when such an action would leave no 
remaining acceptable registers; this situation can actually occur in certain special cases, such as return 
statements, where an operation requires a value in a register not normally used to hold values of that 
type, 

3.2.2.2 TTEXPR 

The top-level code generation routine is TTEXPR. The function of TTEXPR is to generate a sequence of 
abstract machine instructions which will evaluate a given expression and leave the result in an acceptable 
location, as specified by a LOC parameter. The operation of TTEXPR begins with the removal of 
impossible cases from the LOC parameter, as described above. Then, TTEXPR passes the expression and 
LOC parameters to a routine CGEXPR, which generates abstract machine instructions to evaluate the 
expression, using the LOC parameter as a non-binding indication of preference. Finally, TTEXPR calls the 
routine CGMOVE to emit, if necessary, abstract machine instructions to move the result to an acceptable 
location. 

3.2.2.3 CGEXPR 

The function of CGEXPR is to generate a sequence of abstract machine instructions which will evaluate a 
given expression. CGEXPR is given a LOC argument which specifies preferred locations for the result of 
the expression; however, unlike TTEXPR, this specification is non-binding and is used only where a choice 
exists. 

The operation of CGEXPR consists basically of testing for a set of special cases and then performing the 
appropriate action, which is usually to call another routine which does the real work. The first special 
case is where the expression node is shared and the expression has already been evaluated; in this case, 
no action need be taken. Another special case is where the top-level operator is a conditional AMOP and 
a value is desired (as opposed to a jump, which is the usual case); in this case, a routine JUMPVAL is 
called to emit the desired code. The other special cases involve particular top-level operators: 
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indirection, assignment, conditional expression, function call, and the "leaves" of the expression tree, 
identifiers and literals; in these cases, the code generation routine corresponding to the particular top- 
level operator is called Finally, in all other cases, the routine COOP is called to emit code to evaluate the 
expression. 

3.2.2.4 COOP 

The function of CGOP is to emit code to evaluate an expression whose top-level operator is not one 
special-cased by CGEXPR. Like CGBCPR, CGOP is passed a LOC indicating non-binding preferences for the 
location of the result of the expression. 

The operation of CGOP is performed in six steps. First, a routine CHOOSE is called to select an OPLOC 
from the top-level operator's location definition in the machine description. Second, desired locations for 
the operands of the top-level operator are determined. Third, a routine EXPR2 is called which makes 
recursive calls on TTEXPR to emit code to evaluate the operands into the desired locations. Fourth, code 
is emitted to save any registers which are specified in the machine description to be clobbered by the 
execution of the top-level operator. Fifth, the exact location of the result of the expression is 
determined. Sixth, the actual abstract machine instruction for the top-level operator is emitted. 

If the result location specified by the LX parameter is a label, or if the selected OPLOC specifies that the 
result is left in the first or second operand location, then the exact location of the result of the 
expression is fixed. Otherwise, a particular register must be chosen from the set of registers specified in 
the result field of the OPLOC (the compiler is currently unable to handle OPLOCs which specify a set of 
memory references as the location of the result). In the search for a result register, the priorities are as 
follows: first, free registers which are preferred result locations; second, busy registers which are 
preferred result locations) third, free registers which are not preferred result locations; and fourth, busy 
registers which are not preferred result locations. If a busy register is selected, register contents are 
saved in temporary locations as necessary. 

For the purposes of finding a result register, a register containing an operand is considered free and a 
register containing a pointer to an operand is giyen lowest priority. A register containing a pointer to an 
operand is protected because the implementation of a AMOP may alter the contents of the result register 
before the operand referenced by the pointer in that register is used An example is the following HIS- 
6000 code for the AMOP '+pl* (addition of an integer to a pointer to a word-aligned object): 

LXLO I 
ADLXO P 

This code loads index register 0 with the integer I and then adds to register 0 the pointer P. (The code 
for the AMOP includes the load instruction since in general integers cannot be stored in the HIS-6000 
index registers as they are only halfword registers.) If the code generated for P leaves P referenced 
through index register 0, the load instruction will "clobber" register 0 before P is accessed by the add 
instruction: 

LXLO I 
ADLXO 0,0 

However, if index register 0 is protected, index register 1 will be chosen instead to hold the result, 
producing the following correct code: 



LXL1 I 
ADLX1 0,0 
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3.2.2.5 Selecting an OPLOC 

The purpose of OPLOC selection is to select a set of operand/result locations for the top-level operator 
of an expression by choosing one of the OPLOCs from the location definition of the operator in the 
machine description. The choice of operand/result locations will affect the amount of code produced to 
evaluate the expression, both because of different code sequences which may be produced by the macro 
definition for the operator and because of additional loading, storing, and saving operations which may be 
required in order to set up the operands and move the result to an acceptable location. A general 
solution, taking into account all possible locations of operands and results, is a complex optimization 
problem. Instead, a more limited approach has been taken which uses the provided preferences for 
result locations and available information about the possible result locations of the top-level operators in 
the operand subexpressions. For example, if an operand is an identifier, then its location is known to be 
a memory reference of a particular class. Similarly, various operators may be defined in the machine 
description to always place their result in one of a particular set of registers. Using information of this 
sort, plus knowledge about the current register usage, a rough estimate can be made of the number of 
additional load and store instructions which will be required for each OPLOC in the location definition; 
f rom the set of OPLOCs, the one with the lowest additional cost is chosen. 

For example/consider the expression "I + (J/K)." (For clarity, source language operator symbols are 
used in this example to represent the corresponding integer abstract machine operations.) Assume the 
following location definitions (the OPLOCs are numbered for future reference): 



r,r,lj 


(I) 


rMl; 


(2) 


M,r,2; 


(3) 


rl>,l[r2J 


(4) 


r2,r,l [r3J 


(S) 


r3,r,l [r4J 


(6) 


rl,M,i D"2J 


(7) 


r2Ml [r3J 


(8) 


rSMl [r4J 


(9) 



Here M represents all memory reference classes and r represents a set of general registers consisting of 
rl, r2, r3, and r4. The division operator is modeling a machine instruction which produces pairs of results 
(the quotient and remainder) in adjacent registers. For the division abstract machine operator, only the 
quotient is used; the other register is considered to be "clobbered" by the execution of the operator. 
Note that one can deduce from these location definitions that both operators always leave their results in 
general registers. 

The generation of code for the expression "I + (J / K)" begins with the selection of an OPLOC from the 
location definition of the V operator. In this case, all of the OPLOCs specify the same set of result 
locations (the general registers); thus, the desired locations for the result of the expression does not 
affect the choice of OPLOCs. Instead, the choice is made on the basis of the possible locations for the 
operands. In this case, the first operand is a variable I which is known to be a memory reference of a 
particular class. The second operand is the result of a division operator which is known to leave its 
results in either rl, r2, or r3. On this basis, OPLOC (3) is chosen because no extra operations are needed 
to move the operands into acceptable locations, whereas both OPLOCs (I) and (2) do require such extra 
operations. 

Next, a recursive call is made to generate code to evaluate the subexpression "J / K." The desired 
locations for the result of this expression are those specified by the chosen V OPLOC for its second 
operand, namely r, the set of general registers. However, since the V OPLOC specifies that the second 
operand location is also the location of the result of the V operator, the intersection of that location set 
with the set of desired locations for the result of the V operator is used instead, if that intersection is 
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non-null. Thus, the following factors are used in selecting an OPLOC for the T operator: first, which of 
the possible result registers (rl, r 2, r 3) are desired result locations; s«wnd, which of the po^ble result 
registers are frees and third, wnich of the "ctaboered* fygi*t*« r3, r4) are ,,f rm. In this particular 
situation, the possible location of the first oj^eitf 

of the OPLOCs. However, the aecood operand, which >i also known to be a memory referent*, favors 

OPLOCs (7), <»), and <9X 

In addition, when selecting en OPLOC from ■ location definition, certain OPLOCa may be rejected entirely 
because they specify conditions which can not be mat. for example, if an OPLOC specifies (either direcWy 
or indirectly through ah operand location) that tb« ^ 

memory, then that OPLOC wttt las) rejected if a temjaorary Josatipn is jn# lecojylajbte. Ti|* OiLOC is 
rejected because, given a value in a register, the only general method by which the code feoeretor can 
make that value into a memory reference is by saving it in a niw^ aliacflied temporary location. (Recall 
that a specific memory location la not provided for the result, oniy a sat of acoapiefeie memory reference 
classes.) Similarly, if the result will be m memory and i» deairad in loeejory, t^ will be 

rejected if there are one or more possyote resufimaaiory refjirjance destos wf^ ar* wt acceptable 
result locations; this is done because the code generator is not capable of transforming a memory 
reference from one class to another. Simiiar cheeking js perfotmedon the opera^ 
in the OPLOC: if an operand is required by the OPLOC to be in memory but not i|T roo-indirect memory 
reference classes are allowed, then that OPLQC will be rejected if the ope r a nd e p j ar stor ^s not guanenteed 
to place its result in an acceptable memory location or if it can place its result in a register but 
temporary locations are not acceptable. These restrictions allow a location deflation to contain extre 
OPLOCs which apply only in special cases sine* such OPLOCs wW never be ch ose n unless the epeciel 
cases hold. 

An example of how the OPLOC selection method can be utilized in the writing of a machine description is 

the following definition of the '♦pi' AMOP (addition of a integer to a pointer to a word-aMgned object) 

taken from a hypothetical WS-6000 inat^na description (% described 0 

implemented at the tim th* s«M 

executing the % *pV operation In the general case is 

LXLO I 
ADLXO P 

where I is the integer in the low-order half of a word in memory and P is the pointer in the high-order 
half of a word in momory. The result of this operation is lift in an index regiaterj thus ths OPLOC for this 
code sequence is 

However, if both the integer and the pointer must be computed into registers (which occurs frequently in 
referencing elements of an array), the integer and the pointer must. : jWrj^..|«|..*^^,:}Btp temporary 
locations before this code soqjpnce c an b* applied^ TJl^^ 
these i circumstances results m excessive objed code. ^ 

ALS 18 
STA TEMP 
ADLXO TEMP 

which shifts the integer in the general register into the high-order half w^ a temporary 

location, and adds it to the pointer in the index register. The OPLOC for this cpde sequence k 

x/,ii 
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In the case where the pointer is in an index register and the integer is a constant V, then the desired 
code is 

EAXO n,0 
with an OPLOC of 

x,intlit,l; 

The described OPLOC selection method allows all three OPLOCs to be included in the location definition for 
*+pl\ In particular, it guarantees that the third OPLOC will never be selected unless the second operand 
is an integer constant. 

3.2.2.6 Generating Code for Subexpressions 

After an OPLOC has been selected, CGOP calls a routine EXPR2. to make recursive calls on TTEXPR to 
generate code to evaluate the operands of the top-level abstract machine operator. The LOC arguments 
passed to TTEXPR in these calls are taken from the operand fields of the selected OPLOC and, in the case 
of operators which place their result in an operand location, the desired locations for the result of the 
top-level operator. If there are two operands, EXPR2 makes sure that the two operands will not require 
the use of the same register (for example, by using a register to hold both one operand and a pointer to 
the other operand); this is done by checking the LOCs for -overlap" and removing certain possibilities. In 
addition, EXPR2 evaluates first the operand which is more complicated on the basis of the sizes of the 
subtrees for the two operands; this tends to reduce the number of saving and restoring operations 
performed. In the course of generating code to evaluate an operand of a binary abstract machine 
operator, it may be necessary to use the register containing the already computed value of the other 
operand or a pointer used to reference it, in which case code is generated to save the contents of this 
register in a temporary location. Thus, after generating code to evaluate both operands, EXPR2 calls a 
routine RESTORE to generate code, if necessary, to restore the saved value to its original register. 

3.2.2.7 Register Management 

The status of the various abstract machine registers with regard to register allocation is contained in an 
array of structures called REGTAB. Each element structure of the array represents the current state of 
one abstract machine register. An element structure consists of two members: UCODE, an integer 
indicating the current use of the register, and REP, a pointer to the subexpression tree whose value is 
currently in the register. The possible values of UCODE are listed below with their interpretations: 

UCODE Interpretation 

0 the register is free 

-1 the register contains the value of the expression pointed to by REP 

-2 the register has been marked "do not use unless necessary" for the purpose of 
finding a register for the result of an AMOP; although the register contains a pointer 
to one of the operands of the AMOP, it is free in that it may be selected as a last 
resort without having to save its contents. 

n>0 the register does not directly contain a value, but there are V conflicting registers 
containing values which must be saved before this register can be used. 

The routines used in register management are described below: 
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CLEAR(R) - Register R,, which must directly contain the value of en expression, it made 
available for use; its current value is not saved 

ECLEAR(E) - The register associated with the expression E, H any, is CLEARed. 

FREEREGKW) - A register from the set specified by W is made available for use; the 
contents of registers are saved if necessary. 

GETREG{W1,W2) - If possible, an unmarked register from the set Wl is made available for 
use. Otherwise, if possible, an unmarked register from the set W2 is made 
available for use. Otherwise, a marked register from the set Wl is mode 
available for use. Within each set, free register ai a iliesen in pref e ren ce 
to busy roisters; if a£u*y register is ch e a a n, its ee wtents ere sawd 

MARK(E) - If the expression E is an indirect reference, the register uu n l a li ii ng the 

pointer is marked "do not use unless necessary." 

NBUSY(W) - Return the number of busy renter* in the set W. 

NFREE(W) - Return the number of free registers in the set W. 

RES£RVE(RE) - Register R is allocated to boW the vaiue of the eicpreesion t Register R 

RESTORE(E) - If the «*k*^^^ 

reference) has been saved in a tea^orery iocatien, it festered to the 
original register. 

SAVE(R) - Register R is made available for use by saving the contents of whatever 

registers are necewary. 
UNMARK(E) - Undo * MARK. 

The following is a typical series of calls made by GGGP in the generation of coda tor en expreesion E 
whose top4evel operator is a binary operator with o pe ra nds OW end 0P2i 

OPLOC-CHCX^EJJX) choose an OPUX 



EXPR2(0P1,0P2) 


recursively generate code to evaluate 




the operands into acceptable jocettows 


EOEARC0P1) 


mefce *poreiid registers eveHeWe for 


ECt£AmOP2) 


the result 


SAVE(*) 


save "clobbered" registers, if eny 


MARK(OPl) 


mark registers used to hold pointers 


MARK(0P2) 


to operands 


R-GETREG(V) 


select a result register 


UNMARK(OPl) 


unmerk any sterfced registers 


UNMARK(0P2) 




RESERVE(R,E) 


reserve result register 



3.2.2.8 Possibilities for Failure 

The code generator can fail in two ways: (1) it can reach an impossible situation end announce a compiler 
error* and (2) it can unknowingly generate incorrect code. Examples of impossible situations ere (1) 
discovering that there are no acceptable OPLOCs in the location definition for an operator, <2) be*ng totd 
that the result must be placed in a register from the empty set of registers, and (3) discovering that an 
essential locetion definition or macro definition of an abstract machine operator was not provided by the 
implemented The most likely cause of a failure is an incorrect machine description. Exempies Of errors 
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which can be made in the machine description are (1 ) an 0P10C specifying that both operands must be in 
the same register, <2> an OPLOC specifying a sat of memory reference desses for the result location, (3) a 
macro definition containing errors, and W) a macro definition wr^h does not anticipate a particular 
operand or result location, or combination tnerwf, eltofcW by tr» 

eesefltief On the ease of move opwatiom which nwst ba:c^e*%:»o%*^ and between 

registers and momory). Some of these errors could be McWW ffc~|M*r Which processes the 
machine description <GT>, Another possible cause effe^Jfeh iWris^methine with ah insufficient 
number of registers. Soch a machine may require thal^e renter be 'uelJ HtHm *bbth a pointer Jo eh 
operand and the result of an operations as described above, this situation may result in incorrect code. 
Hopefully, abstract machine models of real machines will not suffer from this proWem. Of course, the 
other possible cause of failure is a bug in the code generator itself. It would be fhWeaflhg and useful if 
such a code generation algorithm could be proven correct, given sensible restrictions on the machine 
description end the assumption of correct macro oaf infttens. 
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4. Conclusions 

This paper has described the implementation of a portabla compiler for the programming language C The 
compiler was first imptemented by the author in a seven month pariod on the Bali laboratories Computer 
Science Research Center's PPM UNIX system. Tha compiler was then used to compile itself, and the 
resulting code moved to the f«S-«000. Anothar month was spent de^ggfrg the co m prt a r untii the 
version of the compiler compiled on the HI$r*O00 successful ^ This was regarded ee e 

significant test of the compiler. 

4.1 The Compiler 

The major problem with the compiler itself is its speed. The compiler appears to be more then twice ee 
slow as other compilers for similar source languages. This slowness is due almost entirely to the use of e 
macro expansion phase (a phase not liKely to be present in ordinary compilers), since the compiler tends 
to spend half or more of its time in the macro expansion phase. The slowness of the compiler seems to 
be a problem inherent in the chosen compiler structure; no amount of mere recoding is liKely to 
significantly reduce the percentage of time spent in the macro expansion phase. One approach toward 
improving the speed of the compiler would be to eliminate non-essential processing such as the 
construction and interpretation of character-string representations of macro calls and the rescanning of 
macro definitions. The macro language could be modified so that the result of the expansion of a macro 
call would never be needed as an argument to another macro call and thus could be printed directly, 
rather than returned as a string and rescatwed Given this restriction, the macro definitions could be 
compiled into procedures which simply print strings end caH other procedures. These procedures could 
be called directly by the code generator; alternatively, they could be called by a procedure which 
interprets a suitable encoding of the intermediate language. 

A second problem with the compiler is its size, in terms of both the amount of file space necessary to 
support an implementation of the compiler end the amount of memory required to execute the compiler 
phases. The source of the compiler is about 250K characters, the source of GT is about 80K characters; 
thus, the file space required tor source, object libraries, and executable files is on the order of 1M 
characters. Only the size of the code of the code generator is a result of designing the compiler to be 
portable; it is likely that a code generator designed for a specific machine would be much smaller* Other 
reasons for the large size of the compiler stem from the particular programming techniques used In 
particular, keeping the entire tree representation of a function in core at one time during code generation 
requires that a large block of storage be reserved Also, the use of a bottom-up table-driven LALRd) 
parser seems to result in a larger syntax analysis phase than would result f rom using recursive descent, 
as does the UNIX C compiler. The Urge size of the compiler limits the number of computer systems which 
can support the compiler. 

Despite these problems, it is believed that were one prepared to make the investment necessary to 
implement C on another machine, the size difficulties and related costs would be outweighed by the 
relative speed with which one could brir^ up a working implementation. One could then concentrate on 
making it more efficient, having the advantages of a C compiler to work with and the ebitity to progrem in 

C. 

The least flexible machine-dependent component of the compiler is the code generation algorithm. It is 
acknowledged that a clean mechanism for allowing the implementer to tailor the code generation algorithm 
through the addition of procedural knowledge would be an improvement. On the other hend, clinging to 
the idea that the code of the compiler will never be touched is unrealistic. A likely prospect for 
modification is the code related to the calling sequence since it may be desired to use a system standard 
calling sequence instead of the one built into the compiler. Another problem which would be solved most 
easily by modifying the code generator is the IBM S/360 addressing problem. Because a S/360 
instruction cannot contain an arbitrary memory address, C external variables must be referenced by first 
loading a register with a pointer to the variable (en address constant) end then Ming the register es a 
base register in the actual instruction. These actiom could be performed by the macro definitions using 
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conditional expansion; however, it would be easier to modify the code generator to handle this particular 
case. r 



The most direct method of moving a portable compiler based on a machine description requires access to 
an existing implementation of the compiler. The process of moving a compiler written in its own language 
from machine A to machine B is as follows: First, one writes a machine description for machine B. 
Second, the machine description is used by a construction program running on machine A to produce • 
new compiler which produces code for machine a Third, the compiler on machine A is used to compile 
the new compiler, producing a compiler which runs on machine A but produces code for machine B. 
Fourth, the new compiler is used to compile itself, producing a compiler which runs on machine B and 
produces code for machine a This process is called a half bootstrap. On the other hand, the Poole and 
Waite approach does not require the use of an existing implementation. One need write only an 
interpreter or a translator for a very simple abstract machine language in order to move a program to a 
new machine. This technique is called a full bootstrap. In practice, the need for a half bootstrap often 
represents a significant obstacle to moving a program. 

The full bootstrap method can be used to move a portable compiler based on a machine description as 
follows: Initially, a simple imaginary machine is defined as a vehicle for bootstrapping. A compiler which 
fUn »l °. n . ,nd P roduces code for tnis imaginary machine is then constructed using the half bootstrap 
method described above. Now, in order to move the compiler to a new machine, one implements an 
interpreter for the imaginary machine on the new machine. This action results in an "existing 
implementation of the compiler, running on the new machine, which can then be used to carry out the 
half bootstrap as described above. 

4.2 The Compiled Code 

Although there are weak spots, the code produced by the compiler is good considering that it is almost 
completely unoptimized. It is certainly better than would be produced if the abstract machine were the 
typical machine-independent abstract machine with one accumulator and one index register, given the 
same complexity of the macro definitions (they do not perform register allocation). Such an 
implementation would not be able to take advantage of the HIS-6000's two accumulators or the multiple 
index registers, nor would it recognize the fact that byte pointers cannot fit in the index registers. 

One of the weak spots in the compiled code concerns floating-point operations. The code generator 
performs all floating-point operations in double-precision, issuing single-to-double conversion 
operations before using single-precision operands. It is unable to utilize the HIS-6000 machine 
instructions which operate on a single-precision operand in memory and a double-precision operand in 
the F register. Since the implementation of a single-to-double conversion is to load the single-precision 
operand into the F register, very poor code is produced for single-precision floating-point expressions 
(as opposed to very good code for double-precision expressions). One way to handle this situation would 
be to implement a general subtree-matching facility for optimization. With such a facility, the implementer 
specifies in the machine description that a particular combination of abstract machine operators (specified 
in the form of a tree) is to be replaced by the code generator with a new abstract machine operator; the 
new operator is defined by the implementer in the machine description just like any of the built-in 
operators. In the floating-point case, one would specify that a subtree of the form (using a LISP-like 
notation) 

( double-prec-add ( el , single-to-double ( »2 ) ) ) 

would be replaced by 

( single-prec-add ( «1 , »2 ) ) 

where single-prec-add is a new abstract machine operator which would be defined to be the "FAD" 
instruction. This method of subtree-matching can be compared to the hierarchy of abstract machines 
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method in that the new abstract machine operators can be considered to be instructions of a higher-level 
abstract machine. The differences are that, in the case of the subtree-matching method, the definition of 
higher-level operators is optional (thus there is no multistage translation when optimization is not desired 
or needed) and that the implementer defines the higher-level operators to suit his needs. The subtree* 
matching approach to machine-dependent code optimization has been investigated by Wasilew [17]. 

Another weakness in the compiled code concerns array subscripting. Instead of placing the offset of an 
array element into an index register and performing an indexed memory reference, the code generator 
adds the offset to a pointer to the base of the array, producing a pointer (in an index register) which is 
then used to reference the array element. Thus, the code generator regards index registers only as base 
registers to hold pointers, and not as index registers to hold offsets. One reason for not implementing 
the capability of using index registers for subscripting is that this method of subscripting is often not 
possible. For example, on machines like the HIS-6000 with single-indexed instructions, this method can be 
used only for external and staflc arrays; all other arrays require the use of an index register just to 
reference the base of the array. (Actually, one can perform double-indexing on the HIS-6000 by using 
an indirect word; however, this was not recognized at the time the compiler was written.) The capability 
of using index registers for subscripting could be implemented using the subtree-matching facility 
described above; one would test for subtrees of the form 

( pointer-add ( address-of ( extern | static ), <any> ) ) 

and replace them with a new abstract machine operator which would be defined to produce the desired 
code. A more satisfying solution would give the code generator more knowledge about addressability so 
that it could use index registers for subscripting whenever possible, based on information given in the 
machine description. 

A third weakness of the compiled code is the use of indirection. The code generator only indirects 
through pointers in registers; it is unable to utilize an indirection-through-memory facility (except through 
a specific location which implements an abstract machine register). Again, a better understanding of 
addressing is what is really needed. 

4.3 Summary of Results 

This paper has presented a technique for the design of portable compilers and has demonstrated its 
practicality through the implementation of a portable C compiler. The main difference between this work 
and the previous work described in section 1.2 is that in this work, the system was designed specifically 
for the language being implemented; it is this restriction which contributes most to the practicality of the 
approach. In addition, this work has emphasized the concept of a machine-dependent abstract machine, 
thus tying together the work on portable compilers and program transferability. 

The advantages of the technique presented in this paper over the technique of rewriting some or all of 
the generation phase are (1) that the implementer can modify the compiler to produce code for a new 
machine with less effort and in less time, and (2) that the implementer can be more confident in the 
correctness of the modifications. Almost the entire code of the generation phase, already tested in the 
initial implementation, is unchanged in the new implementation. This code includes the code generation 
algorithm, the register management routines, and the macro expander. Furthermore, the modifications 
which must be made are localized in two areas, the machine description and the C routine macro 
definitions. The implementer is primarily concerned with the correct implementation of the individual 
abstract machine instructions. The interaction among these instructions, in terms of their correct ordering 
and the use of registers and temporary locations, is handled by the code generation algorithm and need 
not be of concern to the implementer. It is this reduction in the complexity of the problem which leads 
to the increased confidence in the results of the modification. 

The portability of the compiler has been tested by the construction of version of the compiler for the 
DEC PDP-10. The initial machine description and macro definitions for the PDP-10 implementation were 
written and debugged by the author in a period of two days. 
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4.4 Further Work 

There are three main direction* for further work. One it to develop machine models which will allow the 
generation of acceptable code for • larger class of machine*, Such macWnO model* will have the effect of 
reducing the complexity of the descriptions of macNna* which do not completely correspond to the 
machine model described In thf» paper. IWtth the rtS-idOe? «** 4WW '^^-''**j» area of 
complexity in the machine description is that of character manipulation. One wooid desire a machine 
model which allows the implement* to describe mors convefrfihtty th* ipMitalioti of characters on 
his machine. Similarly, a machine model which allows a better uriderstendlng of addressing would be 
desirable. ■ V! . 

Another direction for further work is to develop machine-independent code generation algorithm* which 
will produce more effident code. In particular, the er c ^* <* reader attbcalion under complex 
constraint* should be wwi^ In eddW^ and 
safely the code generation algorithm through the addition of procedural knowledge should be developed. 
Such techniques should allow the compiler to be modified to produce code for unanticipated new 
machines. 

The third direction for further work is to apply the technique of portable compilers to more complicated 
and more powerful language*. The technique of using e'ijiectilM 

and a machine description, even aside f to* porf^^ code 
generator. It would be interesting to see if this technique, cou^l reduce the complexity of code 
generators for large language* and wt^tar pcrteWHty could «tffl 
efficiency of the object code. 
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Appendix I - The Mevohine Dwriptlon 

The fcrmet of the machine description f» described in cet*U in the 
fro*^ r^WQD m«Wn. d«cr t^mon g fv^ in App*n<ix TV Jn 

The convention of writing syntactic alternatives on separate tines it used throughout 

1. Definition Statements 

Th# machine dwcriptten bnin« with a series of defWtfen stateinehls. M definition statements are 
described in the sections Mow in the order in whkh they shcMtf sppeer i«"th» maer^ owcriptkm. ' ' 

1.1 The TTPKNAMW Statement 

The TYPENAMES statement define* the names which are used in the machine description to represent the 

Rf lm iI5££?! ta typ# * s charac,#r ' int# i» r ' Rating-point, and doubie-precWpn floating-point The term of 
the TYPENAMES statement is w > > 

<typenemesjstmt>: typenames ( <namejist> ) j 
<namejist>: <nameJM> , <name> 

<name> 

The first name corresponds to the internal typo number 0, tha swnd with type 1, etc Because the 
internal type numbers are fixed in the cofltpiter, thr be (equivalent 

typenames (char, int, float, double)} f 

The REGNAMES statement defines the names of the abstract machine reflsterst these registers are 
••sighed internal register numbers (used in REFJASE, section ZU.2V •t*^|#i Winter number 0, in 
the order in which they appear in the REGNAMES statement. The fc^WW RiEtSWMes statement is 
similar to that of the TYPENAMES statement; for example, the IfONAMES statement used in the WS-6000 
implementation is ." ? ; ■ 

regnames (xO, xl; x2, x3, x4, x5j a, ej, f)s 

In this example, all but the F register correspond directly to actual register* on the HIS-6000: registers 
XO through X4 are the first five (out of eight) index registers, 'ifpli^^W'tw' 
accumulators. The F register is a fictitious floating-point accumulator Whfch i« reality corresponds to the 
combined A, Q, and E (exponent) registers. The fifct the! ^ the r* :;; '*e*i*§r -;i*MW r ^'W-fc tint Q 
registers is specified in the CONFLICT statement, described be^. Only those actual machine registers 
which ere to be used by the cede generator in producing code to evatuat* expressions should be included 
in the REGNAMES statement* registers used only for em^wtonent pc^erfc suxfflary address calculations, 
or other scratch cateut^r« performed within the code for « smgle *^ the 
REGNAMES statement/ For example, on the WS-60t)0i hSree Ino^x ret^irs are not defined In the 
REGNAMES statement: X7, which contains a pointer to the current stack frame, X6, which contains a 
pointer to the current argument list, and X5, which is used as a scratel'le^ 
characters. 
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1.3 The MEMNAMS8 Statement 

The MEMNAMES stateiiwnt associates names with the varies classes «t WfWOfy references •;se specified 
by negative value* of REF.SASE (Section Zl.U). . ^:;ft»»fw,;#f ,j glN a ^| i i| i iBi:: i t i ^iWiUlt ts^f^iy^thet 
of the TYPCNAMES ttalementi . for . txampto, th» MCU^ »taWa»*rt 4»sd m the 
Implementation is 

memnamas (rag, auto, ext. stat.param, label, InMit, f lottlit, stringlit, i>iO, «1, 1x2, 1*3, Is, k|)i 

The first nine names refer to j*reoafii5ad memory reference classes (f^FJASE - 0,-1.-2. - the 
remaining names refer to indireti referei^ tlw<w^ the ab«t«aet pecWnt reglatow del me d In the 
REGNAMES statement (REF.BASE - -9,-10, — ). The first name "rag" » new used; It serves only as a 
placeholder. No name is provided for indirect raf erer>ce* thre^ |fre f a*nQ» f ■ reenter is 

not used to hold pointers and, being the highest numbered reenter, emfW l ng It dees not effect the 
positions of the other names in the list. 

1.4 The SIZS Statement 

The SIZE statement defines the sizes of the primitive C data types in terms of bytes. The farm of the 
SIZE statement is 

<size_stmt>: size <size_def Jist> $ 

<size_defjist>: 5«ixe*^fy»t» . <sitojdef> 

■ ^'1»jzejis;f> V',': 
<size_def>: <int^ 
<typejist>: <typejist> , <type> 

<type> 

The integers specify sizes in bytesj the types are the r«ma« of primiUve C O^ ,type« (as Specified *» the 
TYPENAMES statement) with the corresponding siie. For ' example, the SIZE st at eme nt need fw the HtS- 
$000 implementation is 



size Kchar^int.float^doublah 

All addresses computed by the compiler are in terms of byte addressing! byte addresses are converted to 
word addresses for non^charaeter operations by the macro definitions. For example, on the WS-6000, if 
the first element of art integer array begins at offset 0 in the static are* then s»a*e«ue*t Sssments of 
the array are at offsets 4, 8, 12» 16, etc. 

1.5 The ALIGN Statement 

The ALIGN statement defines the alignment factors of the primitive C data types; these alignment factors 
are in bytes. The (byte) address of a variable with an alignment factor "n" must be sere modulo "n"i for 
example, on the HIS-6000, the (byte) address of an integer must be a multiple of 4 An alignment fetter 
must be divisible by all smaller alignment factors; this allow* the cpmpUer to assign sd e r s ate s reiettve to 
a base which satisfies th» rogbast aJignment restriction. The fo^ of the AUGN st ejlimaH t » similar to 
that of the SIZE statement; for example, the AUWstatiwent used m the W s ^ impto aewt atton m 

align HcherfcA^^ ^ 

1.6 The CLASS Statement f 

The CLASS statement is an optimal statement which allows the implementer to define clesses of abstract 
machine registers which are used in similar ways? the register classes so defined can then be used in the 
machine description as ebbreviations for the corresponding lists of registers, The form Of the CLASS 
statement is 
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<class_stmt>: class <class_def Jist> j 

<class_def Jist>: <class_defjist> , <classjdef> 

<class_def> 

<class_def>: <name> ( <register Jist> ) 

<register Jist>: <register Jist> , <register> 

<register> 

The name is the name of the register class, the registers are the names of the abstract machine registers 
(as specified in the REGNAMES statement) which make up the corresponding register class. The CLASS 
statement used in the HIS-6000 implementation is 



class x(x0 > xl,x2 > x3,x4) > r(a&h 
This statement defines the class of index registers X and the class of general registers R. 

1.7 The CONFLICT Statement 

The CONFLICT statement is an optional statement which allows the implementer to specify abstract 
machine registers which conflict in the actual implementation. The form of the CONFLICT statement is 

<conflict_stmt>: conflict <conflict_defJist> j 

<conflict_def Jist>: <conflict_def Jist> , <conflict_def> 

<conflictjdef> 
<conflict_clef>: ( <register> , <register> ) 

Each register pair specifies two abstract machine registers such that only one of the registers can be in 
use at one time. The CONFLICT statement used in the HIS-6000 implementation is 

conflict (a,f), <q,f)j 

which indicates that the F register conflicts with both the A and Q registers. 

1.8 The SAVEAREASIZE Statement 

The SAVEAREASIZE statement is used to specify the size of the save area which is reserved at the 
beginning of each stack frame. The save area is generally used for saving registers upon entry to a 
function, for chaining stack frames together, and for holding other per-invocation information. The form 
of the SAVEAREASIZE statement is 

saveareasize <integer> 5 

The integer specifies the size (in bytes) of the save area. The save area used in the HIS-6000 
implementation is 16 bytes (4 words) long. 

1.9 The POINTER Statement 



The POINTER statement defines classes of pointers according to their resolution; these pointer classes 
represent different implementations of pointers on the target machine. The resolution of a pointer 
corresponds to the alignment factors of the objects to which it can refer; in particular, a pointer with a 
resolution of "n" bytes can refer only to objects whose alignment factors are multiples of "n" bytes. The 
primary use of pointer classes is on machines whose smallest addressable unit is larger than bytes; in this 
case, two pointer classes are defined: one which can resolve only machine-addressable units and another 
which can resolve individual bytes. By defining separate pointer classes, the implementer allows 
computations involving pointers which are known to refer to machine-addressable units to be performed 
in terms of machine-addressable units, and therefore more efficiently. The form of the POINTER 
statement is 
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<pointer._*tmt>: pointer <pomter jtof Ji*t> i 

<pointer_defJi*t>: <pointer jdef Jist> ", <pointer jdef> 

<pointerjdef> 
<pointer_def>: <name> ( <integ er> } 

The names define the names of the pointer classes, the integers are the resolutions of the correepondinf 
pointer classes. . At .(east one and no more then four pc4ntor r l** » * i, wa y o» o* f l i»* lh aoe o a a M a r cto * U i 
•re referred to at PO, PI, P2, ei* Pa in the apectf cation of the AMOP*. 

The POINTER statement used in the HIS-6000 implementation is 

pointer pO<l), pl<4)$ 

PO is the class of pointers to byte-aligned objects! PI is the class of pointers to word-aligned object*. 
Word pointers can be held and operated upon in the index rojf rtaeaf, byte poieteri ere Q^alei upOfi in 
the general registers and indirected through by subroutine. 

1.10 The OFF8BTBANGS OtatemeiU 

The OFFSETRANGE statement is an optional statement which d ef in e*, for each pointer cteae defined *n the 
POINTER statement, the range of offsets permitted m refarei^ iwttreet ¥^ such a |Mt^ umo eection 

2.1.1.2). The form of the OFFSETRANGE statement is 

offset range <offset_defJist> ; 

<oHseJUJefjitt> » 
«effeeMt*» 

<pointorjtfass.name> ( <lo J>Ound> , <N Jbound> > 

where the lo_bounds and hjjjounds are optional integers. Each offset jdef specifies the range of 
allowable offsets for a particular pointer class* tWe range is the set of if)»eger» nc* *ess thew to Jbownd 
end not greater than hijiound If a bound is not present, then the range it con si d e re d u nbo u nded in the 
corresponding direction. If no range is specified for «.««iftJer*ele»*M 
any specified range must include zero. 

1.11 The BBTUBNRSa Stevtement 

The RETURNREG statement specifies in which registers functions returning value* of vertex* type* return 
those values. Registers must be specified for types INT and DOUBLE ea welt as for aM pointer classes 
defined in the POINTER statement. The form of Hie RETURNREG statement is 

<reUirnra gjstmt>; returnref <returnjfrUi*t> i 

<returnjlefjist>: <return_def Ji*t> , <returnjdef> 

<returnjdef> 

<return_def>: <register> ( <typejist> ) 

The types may be names of primitive C data types as defined in th* TYFENAMES etatawent or names of 
pointer classes as defined in the POINTER statement the w respc^ e ^ reenter ia de H eed to be the 
register in which function* returning values of' those ^m^l^^mmli^^l^^ for eeeatate, 
the RETURNREG stateme n* used in the MIS-6000 i mp ls m aw lat ion is 

returnreg q(int,pO,pi), Kdoubiek 

It is advised that pointers o* ail classes oe returned in the same register in a oeeapatibla form to avoid 
.error* caused by twismatcno* in the dec la r a tions oUur<twna returwnj eeintor*. , ^ :. 



<Offsetrarige_*tmt>: 
<offset_defJist>: 

<offset_def>: 
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1.12 The TYPE Statement 

The TYPE statement define* which regjsters ere to be used m the evaluation of expression* to hold 
values of the various abstract mechine data types. The form of tha TYPE statement Is 



<type_stmt>: 
<typejdefjist>: 

<typejdef>: 



type <type jdefjist> ; 
<type_defjist> , <typejfrf> 
<typejdaf> 

<type> ( register Jist> ) 



The type is the name of a primitive C data type at defined in tr* TYPI*^^ or the name of a 

pointer class as defined in the POINTER statement! ^H^^'limWSi^M machine regiatera or 
classes ol [ •batract machine registers which m* be u*ed to hoW valuet of the wre*po^ 
example, the TYPE statement used in the m4^mmm^^h\ 

typecharirWntCrWIoaWfWoubMO^OXr^Hx)} 

The registers specified in the TYPE statement need not include avfiy register physicaHy capable of 
holding a particular type; only those registers which toe i mp ls m a hte r desires to use in evaluating 
expressio^je^wt type should b> ir*lydedjn the^^^ 

*>? WKm,9**W(mfa$fo *£M^*.imfc* hold Mch ♦ 

pointer when retOrnetf by^a functiph c^Mh^ use of the 

iMteWrejgirttitfw^ . .. . 

2. The OPLOC Seotion 

In the OPLOC section of the machine description, the AMOPs are defined, in terms of the possible location* 
of their operands and the corresponding locations of tr^ ra^fc ^definition consists of a liat of 
notes called OPLOCtj an OPLOC specifies a particular % ^ -f^fa* ipeeMori*, second operand 
locations, and result locations. An OPLOC may also specify tremor*** registers ere clobbered by 
the execution of the code for an abstract machine instruction, this ftfj^tbe code generator that it «ay 
be necesaary to emit induction* to save the content* the *|»bbe^^ 
abstract machine inatruction. The forms of an OPLOC are 



<loc_pxpr> , <toc«j»xpr> , <locjaxpr> y 



and 



<locjixpr> , <locj*xpr> , <toc_expr> <clobber> j 

where a clobber is a list of on© or mora register naiiws separated by cepmas and endosed in square 
bracket., the fetation .jMraaje^ T!. ... 



respectively. A loc.tior, .xprs^lor, .p^lf i„ ^thr tffcfiE^elj^ 
cl.ss**; these tttt may be specified using particular register t or memory reference classes along with 
the operations of union CP) and negation (>*>. The syntax of • k>catta expression is 
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<loc_jaxpr>: <registerjsxpr> 

<memory_expr> 

■ ■ 1 
2 

<null> 

<registerjaxpr>: <register j»xpr> f <r*gl»terjikpr> 

* <register jsxpr> 
( <rogister jsxpr> ) 
<refi*ter - n«md> 

<memoryjixpr>: <m#mory jtxpr> I <m©mory jixpr> 

~ <IWMfK^yjlXpf> 

( <memory jwepr> ) 
<memoryjref jclass.jMHtte> 
M 

indirect 

The negation operator v has precedence ever the union operator T- Tt* location expressions "1" end 
"2" may be used only for the location of a re*uft; they specify that the result is placed in the first or 
second operand location, respectively. Only the location expression f or the second operacid of a un**ry 
AMOP mey be nutt. Tho tocattoft efcpretsior* V represents the set of sR nwmory refei^nce c la ss e s, the 
location expression "indirect" represents the sat of all indirect mem o r y reterwice classes* 

The OPLOCs are associated with AMOP* in location fetinMoM which consist ^ » tmr* *W IM* 
followed by or>e or more OPLOCs: 

<locjdef>: <AM0PJist> <0PU)CJist> 

<AMOP_list>: <AMC*Jst»<AMOFJsbel> 

■ <A*IOPJe*i> ■ 
<AMOPJabel>: <AM0P> : 

<OPLOC_Hst>s «0PL0C Jfct> <0PL0O 

<0PL0O 

Each AMOP in the list of AMOP label* is associated with the list of OPLOCa; each OPLOC in the list of 
OPLOCs represents an acceptable sat of operand/result locations for each of the AMOPs. For example, 

the location definition 

+d: -d: *d: /d: f,M,f; 

used in the HIS-6000 machine description specifies that the AMOPs for double-precision floating-point 
addition, subtraction, multipHcatjon, and division all take their first operand in the F register, their second 
operand in memory, and place their result in the F register. Another example is the location definition 

-«: -»: M^; M/M 

which specifies that the AMOPs left-shift-assignment and right-sWft-aasignment both take their first 
operand in memory, their second operand in a general register, and place their result In the other general 
register. A tNrd example is the locatkm definition 

•i: /i: qMq[sJ 

which specifies that the AMOPs for integer multiplication and division both take their first operand in the 
Q register, their second operand in memory, place their result in the Q register, end clobber the contents 
of the A register in the process. Note that the location definitions 
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+i: rMls 

and 

♦it rMr; 

are not equivalent. The second definition allows the code generator to emit an abstract machine 
instruction which adds an integer in memory to an int^^ihe ^' r i gl t ^r artd pieces the result in the Q 
register; the first definition requires that the result be pieced lh*thelwglster eonfihing the first operend. 

The OPLOC section of the machine description consists of a aoquehte of tocatiw definitions which define 
the AMOPs of the intermediate language. (A smell number of AMOPs should not be defined in the OPLOC 
section of tr* mechim clescripti^ 
than orKe in Uie OPUX: section of the mechwie ctow ^ io w. ■ 

3. The Macro Seotion 

The mecro lection of the machine description contains the macro definitions for the AMOPs; these micro 
definitions Expand into the object-language statements nead sT M o interpret the corresponding *fe*4r*ct 
machine instructions. A macro definition consists of a list of AW labWe^Bowed by a list of cheater 
string constants. The list of AMOP labels specify that abstract" eiecr^ Instructions for these AMQPVire 
to be emitted as macro calls which refer to this macro definition. The character strings make up the body 
of the macro definition; they are written out in sequence- as Hie asp nwejii ef i§ ooryosponolng wacro call. 
The character strings may have option*! location pwefieesrejniii terttor a i s» rifti sot of tocettom of the 
operands and result; a character «trir« wtth »n etUehad^ 

macro call only if the test sjwcjftetf by the Jocefion peefc succeeds, A e^ettef String may contain 
embedded macro celts and references to the arguments of the metro ce«<see Appendix VI, section 4). 
The macro definition ,fg« en AMQP ■ must cor re sp on d, to Iho .) o ii e tt oi f ige' lbotk ii y^ IW o Mtf^QP m thit correct 
code must be generated tor eM comeinations of opoi and/i eeUt lesetaww 4ttt' are >efle^ed by the location 

definition. '''"w- 

The macro definitions can refer to the AMOP and the opereedfVesult le<etiene by usirig the following 
abbreviations: 

abbreviation expansion meaning 



•0 Yn(eO) symbolic representation of operation 

•F Xn(«3,e4) symbolic representation of first operand 
*S \ ■ «ymc<Mic re e y eson W eji rorM 

*R Xn(el,e2) symbolic representation of result 

•'0 eO interna) representation of operation 

•T e3,»4 internet rep re s e nt *** of *#tt Operend 

•'S *5,e6 internet representetion^ se^ operand 

•'R »1,»2 Mernal^epreeeMa^btteiuft 



Recall that in the intermediate Im^m^'^ ^t^^^V -^'^-iS^^'-- mechfne instruction, the first 
argument of the mecro cell is the AMOP opcode, and the following arguments are HEFs for the result, first 
operand, and second operand (see section Z.l.1.2). The macro V is the implementer-defined NAME 
macro which can return any convenient symbolic representation for an operation or operand/result 
location; it is assumed to be implemented as a C routine^called ANAME (see Appendix VI, section 4). 

An example of a simple macro definition is the definition for integer addition used in the H1S-6000 
machine description. The location definition is 

+i: rMl; 



and the macro definition is 



+i: " ADsR eS" 

This location/macro definition of tha AMOP '+i* expands to produca assembiy-ianguaga sta t e m e nt s auch as 

X (airfarna} variaWa ")D 

3iDL (tttoral "3") 

0,2 (indirect through X2) 

5,7 (an automatic or temporary) 

A mora complicated macro definition is used- for the AMOP *J* (move' integer). TMs macro definition must 
be capable of generating coda to move an integer between a measjty leestiea end e g^ 
from one general register to tne other. Three character strings with location prafim are used for the 
thr— cases regkter^o-memory, metncry'to-regiater, and ragiitar to regletan 

.ii: 

(r*M)j * 9T«f eft* 

(M»r>: " LDsR eT 

(r*r): " LLR 36* 

The location prefixes consist of location expressions for the first operand; second operand, and result. 
The operand and result location* of a particular macro csH an compared to the location ewpresskms in 
the lection prefix (compariaom w^h a fwH,^ 
succeed, the correspond** cberK^^ 

The macro ss^icm of ^ macros; these may be 

keyword macros (sea section a liM m im pJs m e wt s r e afi ws d m acr os w hic h are catted in th# definitions of 
other macros. A named macro is defined by using the name of the metro in piece of an AMOP in the 
label(s) preceding the body of the macro defintion. A single macro definition may have both AMOP and 
macro name labels; tWa is useM wnan « is de«^ that tr» oaftation ef one eestr act machine instruction 
itself contain another abstract machine instruction since the Internet* names used to refer to the macro 
definitions of AMOP* are not aecessibie to tha writer of the machine description. An example of a 
Keyword macro definition in tne HIS-6000 mecltirw descriptamte^ " 

ons " SYMREF a©" 

The argument to the ENTRY macro is an assembler symbol as produced by the ION macro (see Appendix 



The macro section of the machine description consists of the reserved word "macros" followed by a 
sequence of macro definitions. Macro dafimtk>ns niust be providsd fw nwst of the AMOPs of the 
intermediate language (exceptions are irxfeated m A p pewdix H> and for aH of the keyword macros of the 
intermediate language which are not defined by C routines. An AMOP or a macro name may not be 
defined more theri era in the macro sactkm of ^ 



ADA 
ACQ 
ADA 
ADQ 



Appendix II - Th« Intermediate 



The operations of the abstract machine are represented in the intermediate language as three-address 
instructions! the operetors of these instruct <M$^ •f*^^ 

in the tables below. For each AMOP is listed its opcode (m octal*, its symbolic representation in the 
machine description, the types of its operands and res^ #^ the besic opprettefi 

involved. The type entry consists of a list of types for the Ikst Mprand, second operand (if any), and 
result of an AMOP, in that order; the types are taken from the f<^iw tot o* abbreviations: 

c character 

i integer 

f floating-point 

d double-precision floating-point 

x any type 

p any pointer 

pO class 0 pointer 

pi class 1 pointer 

p2 cless 2 pointer 

p3 class 3 pointer 

I a location (the result of a jump) 

The following notes are referenced in the AMOP tables: 

1 - This AMOP shtoild be defined only if the mwW&m P#*htor classes are definod. 

2 - The definition of tWs AMOP is optional 

3 * OPLXs should not be specif led for this AMOP. 

4 - This AMOP is used only in the tree repreaeoti4ton gf expressions interne! to the code 

generation phase: it should not ap^ 

5 - This AMOP causes a side-effect M^^A ** *** **# 

therefore, ell OPLOCs for this AWl^ spec^ meaipry as the location of the («ret) 
operand. 
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Unary Abstract MacMna Operators 



opcode 


symbol 


typas 


: Mtat 


haiir our ■Hon 


0000 


-ui 


W 




unary minus 


0001 


-ud 


a» 




u*iry minus 


0002 


++bi 


M 


5 


preincrement 


0003 


♦+ai 


M 


5 


post-increment 


0004 


-bi 


U 


5 


pre-decrament 


0005 


-si 


M 


5 


post-decrement 


0006 


.BNOT 


M 




bitwise negation 


0007 


! 


*j 


4 


truth-value negation 


0012 


.sw 






switch 


0013 


++bc 




5 


preincrement 


0014 


++ac 




5 


postincrement 


0015 


--be 


c,i 


5 


procurement 


0016 


~ac 


c,i 


5 


post-dtcrtment 


0017 


&u0 


x^O 




address of 


0020 


ftul 


x#l 


1 


address of 


0021 


&u2 


xrf>2 


1 


address of 


0022 


*Vu3 


x,p3 


1 


address of 


0023 


*u 


P* 


4 




0024 


— QpO 


p0,l 


2 


jump on null pointer 


0025 


«Opl 


PU 


1* 


jump on null pointer 


0026 


—0p2 


P2J 


1,2 


jump on null pointer 


0027 


-*0p3 






jUtap on mjfl pointer 


0030 


!-0p0 


pOJi 


2 


jump On non-null pointer 
jump on non-null pointer 


0031 


!-Opl 


PU 


lit 


0032 


!-0p2 






Si ilaMai 'ifta* "aaVii* ■will .juJWtA*' 

jump on non^nuii pointer 


0033 


!-0p3 


P3J 




jump on non-nuU pointer 



43 



Conversion Abstract Machine Oparators 



wpcooe 


am ■mhitl 


typaa 


notee 


vv*W 




c,i 






XT 


V 




UV**a£ 


XQ 


Cad 






.IC 






f\f\AA 


.If 








.id 


i,d 




UU*4D 


.ipO 


I,p0 




0047 


inl 
•ipi 


i nl 


i 
i 


0050 


.ipi 




t 

A 


oom 


in** 
•tpo 




1 




•TC 


f i 

r f i 




0053 


•II 




0054 


fd 


T,a 




0055 


dc 


de 




0056 


di 

■Ml 


dJ 




0057 


df 


df 




0060 

WW 


aOi 






0061 


•pup* 




i 
* 


0062 




nOn9 


i 
1 


0063 

www 


q0b3 


nO n!) 


1 
1 


0064 




dIJ 


1 

A 


0065 


•PlpO 


p1.d0 


1 


006$ 


•Plp2 




1 


0067 


•Plp3 


pirf>3 


1 


0070 


•P2i 


p2,i 


1 


0071 


.p2p0 


p2,p0 


1 


0072 


•P2pl 


p2>pl 


1 


0073 


•P2p3 


p2*3 


1 


0074 


•p3i 


p3,i 


1 


0075 


.p3p0 


p3*0 


1 


0076 


.p3pl 


p3rf>l 


1 


0077 


•P3p2 


p3rf>2 


1 



notes bttic operation 



convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 
convert 



ctoi 
ctof 
c to d 
i toe 
i to f 
i to d 
i topO 
i to pi 
i to p2 
i to p3 
f toe 
ftoi 
f to d 
d to c 
d to i 
dtof 
pO to i 
pO to pi 
pO to p2 
pO to p3 
pltoi 
pi to pO 
pl top2 
pi to p3 
p2tei 
p2 to pO 
p2 to pl 
p2 to p3 
p3to i 
p3 to pO 
p3 to pl 
p3 to p2 
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BUwry Abstract Maehin* Op o rrtori 



opcode 


symbol 


bp* 


notes 


baste operation 


0100 




i,i,i 




addition 


0101 


— H 


W 


2£ 


addition-assignment 


0102 


♦d 


dAd 




addition 


0103 


-*d 




2£ 


addition-assignment 


0104 


-i 


W 




subtraction 


0105 


■-i 


M»» 


2,5 


subtraction-assignment 


0106 


-d 






subtraction 


0107 


-- d 


d,d,d 


1.5 


subtraction-assignment 


0110 


*i 


■»M 




multiplication 


0111 


»*i 


M>i 


2,5 


muitipiication-assignmant 


0112 


*d 


d,d,d 




multiplication 


0113 


-*d 


d^d 


2,5 


rndtipiicatton-assignmertf 


0114 


/i 


i,U 




division 


0115 


-/• 


M,i 


2£ 


division-assignment 


0116 


/d 


d^ 




division 


0117 


./d 


d,d,d 


2J5 


division-assignment 


0120 


X 


W 




modulo 


0121 


-X 


irU 


2,5 


moduk>-assignment 


0122 


« 






left-shift 


0123 


m« 




2,5 


left-shift-assignment 


0124 


» 






right-shift 


0125 






2,5 


right-shift-assignment 


0126 


& 






bitwise AND 


0127 






2£ 

• 


bitwise AND-assif ftment 


0130 


A 


i,i,i 




bitwise XOR 


0131 


-A 


•»•»• 


2,5 


bitwise XOR-assignment 


0132 


.OR 


•iM 




bitwise OR 


0133 


-OR 


W 


2,5 


bitwise OR-assignment 


0134 


&& 


x,x,i 


4 


truth-value AND 


0135 


•TVOR 


X.X.I 


4 


truth-vatoe OR 


0136 


-pOpO 


pO,pO,i 




pointer subtraction 


0137 


m 


x,x,x 


4 


assignment 


0146 


+p0 


pO,i,pO 




increment pointer by 


0147 


+pl 


pUipl 


1 


increment pointer by 


0150 


+p2 


p2,i.p2 


1 


increment pointer by 


0151 


+p3 


p3»i.p3 


1 


increment pointer by 


0152 


-pO 


pO^pO 




decrement pointer by 


0153 


-Pi 


PU#1 


1 


decrement pointer by 


0154 


-P2 


p2^2 


1 


decrement pointer by 


0155 


-p3 


p3,\#3 


1 


decrement pointer by 
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Abstract Machine Operators, continued 



opcode 


symbol 








0160 


xc 


c,c 


3 


mo\/A char art Ar 


0161 


•ti 


Li 


3 


ma\ja mtAOAr 


0162 


.ff 


f f 


3 


MAWA flftat 


0163 


.dd 


d,d 


3 


move double 


0164 


.dOdO 

T v r v 


dO.dO 


3 


mAuA nnintAr nfl 


0165 


.dIdI 


dI.d! 


1J3 


iyiava nftintftr nl 

HIVVD pWI1 1(91 frf A 




.pepe 




1,3 


move pointer pi 


01 fi7 


.popo 


po,po 


1,3 


move pointer po 


0171 

VIM 




X,X,X 


vi 

*? 


conaiiionai 


017? 




X,X,X 


A 
H 


conaiuonai 


0200 


■■i 


i i 1 
1,1,1 




iiimn Am ami i*I 

jump on equal 


0201 


• 1 


i i 1 




jump on no i equal 


0202 


<j 


i i 1 




iiimn An Iacc than 
jump un iBob iiian 


0203 


>j 


i i 1 




iiimn An orAatar than 
jump un creator man 


0204 


<m\ 


i i I 




iiimn An Iacc than nr ■iinal 

jump un ics» uion ur eujuai 


0205 


> m i 


i i 1 




iiimn An orAator than nr aaiiaI 
jump un gioaiBi inan ur oujusi 


0206 


—d 


d.dJ 






0207 


!-d 








0210 


<d 


d,dj 






0211 


>d 


d.d 1 






0212 


<-d 


ddl 






0213 


>~d 


ddl 






0214 




oO nO 1 






0215 


!-p0 








0216 


<p0 


dO dO 1 






0217 


>d0 


dO dO 1 






0220 


<*p0 


r v »r v i' 






0221 


>-p0 


dO oO 1 






0222 




ol d1 1 


1 2 




0223 


!-pl 


ol d1 1 


1 2 




0224 


<d1 


D 1 *D 1 .1 


1 2 




0225 


>Dl 


D 1 .D 1 J 


1.2 




0226 


<-d1 


D 1 -D 1 .! 


1.2 




0227 


>«Dl 


Dl DlJ 


1 2 




0230 


—□2 


d2 d2I 


1 2 




0231 


!-p2 


p2,p2,l 


1.2 




0232 


<P2 


P2,p2,l 


1,2 




0233 


>p2 


p2,p2,l 


1,2 




0234 


<«p2 


p2,p2,l 


1,2 




0235 


>-p2 


p2,p2,l 


1,2 




0236 


«p3 


p3,p3,l 


1,2 




0237 


!- P 3 


p3,p3,l 


1,2 




0240 


<p3 


p3,p3,l 


1,2 




0241 


>p3 


p3,p3,l 


1,2 




0242 


<-p3 


p3,p3,l 


1,2 




0243 


>-p3 


p3,p3,l 


1,2 
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Abstreet Machine Operators, continued 



opcode 


symbol 


typ«« 


notes 


baste oper ation 


0260 


++bpO 


po^pO 


5 


pra-incrament by 


0261 


++apO 


pO,i,pO 


5 


post-increment by 


0262 


-bpO 


pO,ijaO 


5 


pre-decrement by 


0263 


~»po 


pO,i*0 


5 


pott-decrement by 


0264 


++bpl 


pUpl 


15 


pra-incrament by 


Q265 


♦+apl 


pU.pl 


1,5 


post-increment by 


0266 


-bpl 


pl.i.pl 


15 


pra-decramant by 


0267 


— apl 


PU*1 


15 


post -decrement by 


0270 


++bp2 


p2,irf)2 


15 


pre-mcrement by 


0271 


++ap2 


p2,i,p2 


15 


post-increment by 


0272 


-bp2 


p2^#2 


15 


pro-decrement by 


0273 


--•p2 


P2^2 


15 


post-decrement by 


0274 


++bp3 


p3,i,p3 


15 


preincrement by 


0275 


++ap3 


p3,i,p3 


15 


post-increment by 


0276 


~bp3 


p3J,p3 


15 


pre-decrement by 


0277 


—ap3 


p3^#3 


15 


post-decrement by 
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Appondix III - The Intermediate Language: Keyword Macros 

The keyword macros of the intermediate language are described below in alphabetical order. Each 
section is headed by the name of a macro and its calling sequence; following is a description of the 
arguments and the intended function of the macro call. 

1. ADCONn: ZAn(NAME) [n=0,l,2,3] 

This is a set of macros, one for each possible pointer class. NAME is an object -language symbol 
constructed from an identifier by the IDN macro. The expansion of an ADCONn macro should define a 
pointer constant of pointer class V which points to the external variable or function with the given 
name. This macro is used in the initialization of static and external pointers and arrays of pointers. 

2. ALIGN: ZAL(N) 

N is an integer specifying the CTYPE (an internal type specification) of an object for which the 
appropriate alignment of the location counter must be made. The relevant CTYPEs are: 

value ctype 

2 char 

3 int 

4 float 

5 double 
6-9 pointer 

The expansion of the macro call should be the pseudo-operations needed (if any) to properly align the 

location counter. This macro is used in the initialization of static and external variables. 

3. CALL: ZCA <NARGS,ARGP,0,FBASE,FOFFSET ) 

The CALL macro generates a function call. NARGS is an integer specifying the number of arguments to 
the function call; ARGP is an integer specifying the byte offset in the caller's stack frame of the 
arguments which have been so placed by previous instructions. FBASE and FOFFSET are integers which 
together make up a REF specifying the location of the function being callud (it may be indirect through a 
pointer in a register); these are passed as arguments 3 and 4 of the macro call so that they may be 
referenced as #F in the macro definition. 

4. CHAR: ZC<I> 

The CHAR macro produces a definition of a character constant whose value is the integer I; it is used in 
the initialization of static and external characters and arrays of characters. When producing code for an 
assembler which does not have a byte location counter (for example, the HIS-6000 assembler GMAP), the 
characters produced by CHAR macro calls must be stored in a buffer until either enough are accumulated 
to fill a machine word or a macro call other than CHAR is issued; in this case, all macros which may follow 
a CHAR macro must first check to see if there are any characters in the buffer and if so, print the 
appropriate statement and clear the buffer. 

5. DOUBLE: ZD (I) 

The DOUBLE macro produces a definition of a non-negative double-precision floating-point constant 
whose C source representation is stored in the internal compiler table CSTORE at an offset specified by 
the integer I (the compiler itself does not use any floating-point operations). This macro is used in the 
initialization of static and external double-precision floating-point variables and arrays. 
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6. END: IENDO 

The END macro mar ks the end of t he intermediate language program. It way produce e» IjiJ atete»*e«t, if 
needed, or signal that any pwwessing associate with the and of the progrem *kmM:^r»tfwmmi.: 

7. ENTRY: lEN(NAME) 

NAME is an object languaga symbol constructed from an identifier by the ION macro. The expansion of 
the ENTRY macro should define the symbol at an entry point, tint is, one which is defined in the cur rent 
program but accessible to other programs. 

8. EPILOG: lEP(PUNCNOJPRAMESIZE) 

The EPILOG macro produces the epilog code for a C function. The epHog code should restore the 
environment of the catling function and return to that function. In the WS-6000 implementation, these 
actions are performed by a subroutine. FUNCNQ and FRAMESIZE are integers ^i*»c*t a|Micify the internal 
function number of the function end the size in bytes of its stack, frame, Beep ej thwty , to the MIS-WOO 
implementation, these integers are used to define en essembiy-lenguage symbol whose vahie is the siza in 
words of the stack frame; this symbol is used by tne code produced by the P»W^ 
the stack frame. 

9. Ban: zeqcname) 

NAME is an object language symbol constructed from an identifier by the IDN macro* it is to be defined es 
having a value equal to the current value of the location counter. 

10. EXTRN: XKX(NAME> 

The EXTRN macro is similar to the ENTRY macro except that it defines the symbol to be an external 
reference, that is, one which it uaed in the current program but eat um o d to be defined in mother 
program. 

11. FLOAT: IF(I> 

The FLOAT macro produces a definition of a mm-netativo singie^eeitio^ the 
argument has the seme interpretation as that of the DOUBLE macro. 

12. GOTO: XGO(0,BASE,OFFSET> 

The GOTO macro produces an unconditional jump to a location denoted in the source program by a label 

constant or expression. BASE erri OFFSET together mate up a REF wr«ch specif ie^ ^ 

the jump; these are pawed es arguments 1 and 2 of tl>e macro caH ao that they may be re t e i e n ce d es »R 

in the macro definition. 

13. HEAD: IHDO 

The HEAD macro marks the beginning of the intermediate language program! It mey produce header 
staternents, if needed, or sigwil that any initialization processing sho^ 

14. IDN: II (X) 

The IDN macro should expand to the object language representation of the identifier whose C source 
representation is stored in the internal compiler table CSTORE at an offset specified «y the inteter X. 
The processing performed by this macro may include the truncation of long nama« t the repiec em e nt ^f the 
underline character (which is allowed in C identifiers), and the insertion of special characters) to avoid 
conflicts between C identifiers and other object language symbols. 
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16. INT: UN (I) 

The INT macro produce* a definition of an integer constant whoa* value h apacifiad by the integer 1 It 

it "ttSSliSf in ^ i<B «« t **» *m^m^''mmmm'i^m9k <«• mam** of tewee for 

tha LSWITCH macro. ; : : 

16. LABCON: ILC(N) 

The LABCON macro generates an address constant whoaa valua it the etMreee Waapondmf to Jntarnal 
label number N. Tba LABCON macro ia used to construct the tabtes for the LSW1TCH and TSWTTCH 
macros. ■ -< : ■ v-v'-^ ^ <:-• ■ 

17. labdbf: il(N) 

The LABDEF macro defines the location of internal label number N. 

18. LN: XLN(N) 

The LN macro associates the line In the eource program whose Una number ia specified by the integer N 
with the current value of the location counter, this macro rnay optior^ proxies a comment Ur« in the 
object program to aid in the reading of the object program, or it may deflr^t m^-number symbol to be 
used in conjunction with a debugging system. 

19. LS WITCH : lL8(N,LBA«lJL0FF«»T f I»A»»40FF8BT) 

The LSWITCH macro should gene/ate cede which jump* according te the value of the Integer whose 
location is given by IBASE and 10FFSET (selected from the locations permitted by the OPLOC for the sw 
operettaf* This macro is immediately followed by N<K*MF*T met** <M* ciseA wnfch are immediately 
followed by N U>«CON mao^ (the correspwrfrtt lea^a). Aso^ stotttM^ thro^ the case fist; 
if a metch is found, a jump ehouttsetMe^ If 
the wteger matches none e* the Hat entries, the*-* jtMrt**^ defined by 

LBASE and LOFFSET. ' 

20. NDOUBLB: IND(I) v 

The NDOUBLE macro ia the same as the DOUBLE macro except that the value Of the deflned constant is 
made negative. '\ <; ^yi'v*. 

21. NPLOAT: INP(I) 

The NFLOAT macro is the same as the FLOAT macro except that the value of the defined conatant is made 
negative. 

22. PROLOG: IP(PUNONO^UNCNAMB) 

The PROLOG macro produces the prolog coda for a C function. FUNCNAME it an integer representing the 
name of the function as it appears in the source program; its interpretation is the same as that of the 
argument of the ION macro. FUNCNO is an integer which specifies the internal function number of the 
functionj it may be used in conjunction with the EPILOG macro to access the size of the function's stack 
frame. The PROLOG macro should define the entry point name and produce the code necessary to save 
the environment of the calling function and to set up the environment of the celled function using the 
information provided in the function call. In the HIS-6000 implementation, these actions ere performed by 
a subroutine. The PROLOG macro call appears in the intermediate language program immediately before 
the first instruction of the corresponding function. 



23. RETURN: 1RTO 



The RETURN macro produce* the statements netted to relurn trem a furKtle* to the csWrtg f onettern ♦« 
general, this macro will result hi a transfer t© the EPILOG code. Tha returned value of f yrcttort » 
loaded by preceding macro caNt into tha appropriate renter a* specified In the fCrUlwtEG statement of 

the machine description. . 

24. STATIC: IgTCK^S) 

The STATIC macro defines the location of the static variable whose internal static variable number is R S 
is the size of the static variable in bytes. Typically, this macro wlH define an assembly language symbol 
by which the static variable can be referenced. 

25. STRCON: ISC(N) 

The STRCON macro should H»wate a character pointar which points to tfw atr^ conatawt whose 
internal string number is N. The STRCON macro is used in the mttialteetion of alette and external 
variables. 

26. STRING: IfiRO 

The STRING macro marks the place in the object program where the string constants should be defined 
This ma«ro is implefflawttecl 

27. TSWITCH: lTg(I^^»ABE^PF8arr,IBJM«30FP«»T,HI) 

The TSWITCH macro produces an indexed jump beaexl on tre> vato is t*ven 

by IBASE and IOFFSET (selected from the loeetions oermittad oy tr» OftOC tor the ^ 0#eVatten). This 
.macro is immediately followed by a sequence of HHiM LAMO* inat^ oWm^ 
corresponding to integer v#uea from LO to HL Vatuaa outside tr t to i a ieja ettoeitf Pose* »« It ei M iie » f f Hie 
internal label defined by L»SE awl UJfrSET. 

28. ZBRO: IZ(I) 

The ZERO macro specitie* tha definition of a block of storage initialed to rero» the stee In bytes of this 
storage area is specified - / 
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Appendix IV - Th« HIS-6000 Maohln* D*soription 

The machine description used in tht HIS-6000 implementation it listed below; Mich of its complexity it a 
direct result of the feet that the WS-6000 is not byte-addressed In the mecro definition*, the chetactor 
sequence *\n* represents i the newHne character. 



typenarnes (char^nt,flo«t/ioubri||^ 
regnames (xO,xl l x2,x3,x4Aq>f)i 
memnames (rag ,auto#xt>tat4>araiislabel^ntlit,f to«tlit ) stririglit a ixO l jit 1,1x2,1) 
size l(char),4(int,f loat)£(double)s 
align l(char),4(int,float)3<doubla)j 
class x(xO,xl,x2,x3,x4Vr(a«q); 
conflict (a,f),(q,f){ 
saveareasize 16; 
pointer p(Xl), pl(4)t 
returnreg q(int,pO,pl),f(doubleH 
type ch.r(r)^nt<r),flo.t(f),doubMf)*»0(r)rf)l(x); 

sw: a„l[x4J 

♦pO: -pO: -H:-i*u^ .OR: -pOpO: «: »: rMl; 

♦Pis MMx; 

-pi: *A,U 

-+i: «ft: -A: -OR: . M^r,l; 

*i:/i: <*Mtf*l 




+d: -&. «d: /d: 
X: 

a«: ■»: 

&u: 



tMf; 



.BNOT: .ic: xi: 
" -ui: ~bi: 
xf: xd: .if: .id: 
.fc: .dc: .fi: .di: 
.fd: 

.df: -ud: 
.ipO: .pOi: 



•uto|ext(st«tMringlit|i«|i<wJ 



t»qs 
M,fj 
V? 
Ms 

r w xj 



•ipl: .pOpl: 



.pli: .plpO: 



++bi: 

+*ai: ~ai: ++bc: ++ac: 
--be: — ac: 



Ms; 



♦♦bp: ~bp: 
++ep: ~ap: 



M.aCqt 

mm 

WML** 
MMxj 



--0: M>: <0: >0: <-0j >-0: 
--p: !«p: <p: >p: <"p: >-p: 



r|f»n 

rtxMMi 
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macros 
,«w: " 

xi: 
.cc: 

(auto H ): 
(•tat*):/ 
0«»q>: " 

«q»a):- 



TSX5 
"\\" 



EA»R 

EAaR 

STA 

LOQ 

STQ 

LDA 



SWTQT 



0,7\n" 

.STAT\«" 
.TEMP 
VfIMP\tf 
,tt*# 
.TEMP\n" 



(auto|stat|indiract„): 
"%if<*XaT), ADa* tco(»T)V%) 

tsxs mxar 

(axtlstringlit,,): 

" LDaR aF 

•RRL 27" 

(r„r)s " EAaR 0,aFL 

•RRL 18" 
(r„auto|stat|indiract|«trinfttt)! 

EAXS 0,aFL\n" 

(r„auto):" EAaF ^ 
(r^tat):" EAaF 
(r M auto|st.t): "Wf<*o(# , Rfc 

TSX4 



<r„strin|lit): 
<r*ext>:" 



(q N ia): 
tint**'*), 

*Xif(%o<a'R), 



.ii: 

<r„M>; " 
(H4*r>: " 
(r w r) 5 " 

.ff: 

<f„M>; " 
(MJh " 

.dd: 

<f„M>: " 
(M„f): " 

.pOpO: 
<<V>: " 

<l4r>:" 



EAaF 
TSX4 
•FLS 
STaF 
•FRL 

ADA 
TSX4 

ADQ 
TSX4 



STaF 
LDaR 

LLR 



FSTR 
FLD 



DFST 
DFLD 



LLR 

STaF 

LDaR 



0.7\n" 
.STAT\r»" 

ag§f te^mm 

aR ' 

.arroc* 

27 

aft' ■ ■'■ 

2r 

VcKa^n,) 
.ATOC" 

1tco(« , R)\n,) 
.QTOC" 



aR" 
aT 

36" 



aR" 
aF" 



aR" 

.aF"' 



36" 
aR* 
af" 



.plpl: 

<x„x): " EA«R 0,»3" 

(*JA>. m STZ «R 

ST*»F «R" 

(K^x):" LO»R *F" 

(x„q):" EAQ 0,»3" 

(qnxh" EAuR 0.QU" 

(M^q):" LDQ «F" 

<qnM): " STQ *R" 

.pOpl: 

<r„x): " EA«R 0,«FU" 

<M^>:" LD«R «T 

.plpO: 

<x„r>:" EA»R 0,«3" 

<M„r):" LD#R «F" 

•ic: " AN«F -0377.DL" 

.ipO: 

(M„r): " LD*R «F" 

<r*r>: -\\" 

.ipl: 

<r»x): " EA«R 0,«FU" 

.pOi: 

<M»r>: " LO«R «F" 

(r B r): "\\- 

•pli: 

<x„r>: " EA*R 0,»3" 

•fd: " FLD «F" 

.df: "\\" 

.cf : .cd: .if : .id: " LDQ OJOL 

IDE «35B25,DU 
FNO" 

.fi: .di: " UFA -71B25.DU" 

.fc: .dc: " UFA -71B25,DU 

ANQ -0377,01" 

AD»R «S" 

-i:" SB«R »S" 

MPY «S" 

/i'.%i m DIV «S" 

♦d: " OFAO «S" 

-d: " DFSB tS" 

*d:" DFMP »S" 

/d: " DFOV 

AS»S »R" 



Ontlit,):" 


•FRS 




(.Hftttit * 


LXL5 


' «S ' 




.•PUS' 




«: 






Cintlit,): " 


•FLS 




(/^intlit.): " 


LXL5 






•FLS 


03" 




LD*R 


•F 




•prs 


0,«SL 




ST«« 


■ «F" ■ 


\- ■ 


LO«R 


•F 




•RLS 


0,«SL 




ST»R 


•r 


♦pO: " 


•FRS 


u 




AD«F 


•s 




•FLS 


16" 


-pO: " 


•FRS 


16 




SB«F 


•s 




•FLS 


16" 


♦pi: " 


LXL«1 


•S 




AOL«R 


•F" 


-pi:" 


QLS 


18 




STQ 


.TEMP 




S8L*F 


.TEMP" 


-ui: " 


LC»R 


•r 


— bh" 


LD«R 


•P ' 




S8«R 


-i,DL 




ST«R 


•r ' 


-ud: * , 


FNEG" 




♦+bi: " 


AOS 


•F" 


♦♦•1: " 


LD«R 


•F 




AOS 


•r 


— •«: " 


LOA 


«F 




LOQ 


«F\n" 




SBQ 


-1.0L 




STQ 


: «f* ■ .■ 


U):" 


SBA 


•IjiDL 




STA 


' •F" 



++bp: 



U):" 


LD*R 


«F 




EA«R 






ST«R 


•r 


U): " 


LD#R 


•F 




ADLoR 


XccK.-S) 




ST«R 


•r 


"Op; 
U): " 






LD«R 


•F 




EA»R 


-Xo<»*S)/4,*l 
•F" 




ST«R 


Uh " 


LD«R 


»F 




SBL«R 


Xco(«'S) 




ST«R 


«r 


++«p: 

U):" 






LD»R , 


. «F 




EAX5 






S 1 AO 


»r 




LDA 


•F 




LDQ 


•F\n" 


U):" 


ADLQ 


Xco<i»'S) 




STQ 


•r 


<»q): " 


ADLA 






STA 


•r 


~«p: 






<»x): " 


LD»R 


•F 




EAX5 






STX5 


•F" 


<*«|q>: " 


LDA 


»F 




LDQ 


•F\n" 




SBLQ 


Xco(»'S) 




STQ 


»r 




SBLA 


teo^'S) 




STA 


•F" 


.BNOT: " 


ER«F 


—1" 


4u: 






(ia|iq„r): 






-Xif(«0(»'F), 


ADL»F 


Xco(»T)\rO\V 
36" 


<i«»q>: " 


LLR 


<iq»a): " 


LLR 


36" 


(auto|stat„r): " 


EA«R 


%n<*3,0) 


*if(Xo(«T), 


ADL»R 


Xco{»T)\n,)\V 


(eyt|stringlit w r): " 
<»x>: " 


EA»R 


•r 


EA»R 


•F" 



-A: " 
A:" 
-a: - 
.OR: " 
■OR: " 



AN»F 

ANS»S 

ER«F 

ERS»S 

OR*F 

ORS«S 



•S" 
•F" 
•S" 

•r 
»s* 

•F" 
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«p: " 


CMP*F 


*S 




TZE 


•R" 


!-p: " 


CMP»F 


♦S 




TNZ 


•R" 


<p: " 


CM»*F 


«s 




TZE 


*+2 




TNC 


•R" 


>p: " 


CMP«F 


. *s. 




T2E 


**2 




TRC 


#R" 


<-p: " 


CMP«F 


•S 




TZE 


•R 




TNC 


•R" 


>. p: - 


CMPttF 


«S 




TRC 


*r 



Uh " DFCMP -000\n" 

<*r): " CMP«R 0#L\w" 

TpOpO:" S8UF «s 

«FRl 16" 

W:"t GMAP* 

jmp:" TRA «0" 

o: "ml" 

•n: " SYMDEF «0" 

•x: " SYMREF «0" 

» t! " SYMREF JmWlA.TBI»4WreH 

symref xrroMmx^T«c^@c 

.STAT EQU *" 



p: "Xidn(«l) 


EQU 


* . 




TSXO 


.PROLG 




ZERO 


.FS«0" 


CO: 


"-V20/«l,16/0" 


cr. " 


TSX1 


•F 




ZERO 




rt: " 


TRA 


•EPILG" 


•p: " 


TRA 


.EPILG 


.FS«0 


EQU 


•1/4" 


go: " 


TRA 


•R" 



epq: 

(auto*): " EAQ 
<stat„): " EAQ 
(ia„):" LLR 
(«uto|«t«t|indir«ctJ: 
"*f<Xo<»T), AOQ 



0,7\n" 

.STAT\n" 

36\n" 



♦+bc: 

(auto|stat|indir»ct„): 
"tcpqCOAO.sT) 



—be: 



(•xt M H 



++ac: 



"XcpqCOAO^T) 



(ext.): 



STQ 


TPLjn 

AEtJP 


LDA 


#TEMP 


TSXD 


.WTUA 


An* 

ADA 


lfDL 


ANA 


-0377,01 


EAX5 


0.AL 




.QTQCr 


IDA 


■/•?'';■'. 


AOA 


-OI0004XJ 


STA 


•r 


>: 

STQ 


.TEW 


LDA 


.TEW 


^»**wa» 

TSX5 


.CTQA 


SBA 




ANA 


■O377.0L 


EAX5 


0,AI 


1 OAH 


.QTUC 


LDA 


•F 


SBA 


-01000.DU 


STA 

): 


•F* 


STQ 


.TEMP 


LDA 


.TEMP 


TSX5 


.CTOA 


EAX5 


ML 


TSX4 


.QTOC" 


LDA 


•F 


LDQ 


•F 


AOQ 


■O10Q0.DU 


STQ 


•F" 
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Appendlx V - Tho HIS-6000 C Boutin* M*or© Definitions 



The C routine macro definitions used in the HIS-6000 implementation are Hsttd on the following pages. A 
C routine macro definition is written as a C function returning • character string value. This ctiatactpr 
string is "substituted" for the macro call and rescenned by the macro expander; thus, It may eorftam 
references to its arguments and embedded macro cell* 1**^ ARGC 
and ARGV: ARQC is an integer specifying the numbor e» mt^^^ ^m^^* t^»^ in th# 
associated macro call; ARGV is an array of pointers to those arguments. 

When the following routines were written, the formatted print routine PRINT was capable Of producing 
output only onto a file and not into a string in core; thus, where formatting is necessary, Wjrte routines 
print their output directly and return the null string. Although there ere dangers Inherent >i« fW«i*rectlce, 
in these cases the effect is the same as if the formatted string were returned and printed no>me»y. The 
cheracter sequences '\t\ V. '\V represent tab, newline, and backslash, respectively. 
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char *f n[] 
•oth»r",-irj$ 

oth»r,aif}j 
int nfn 18, 
lineno 0, 
mflag 0, 
pacKb(41 
pacAnoj 

char *aln(argcargv) int argc; char »ar§v[J 

{lineno-atoi(«rgv[03>; 
packf<)s 

r»turnC.N«0 EQU »")» 



char *Mqu(argc,argv> int argq char ♦afgvfj 

{packfO; 
return("«OEQU 



Char *aint(argc,argv) int argc? char *argv[) 
{packfOj 

r»turn<"\tOEC\taO")j 



char *achar(argc,argv) int argc; char •argv{J 

{if (arge>0) packc(«toi(argy£dB 

r«turn(-\\-)5 /* concaal following nawline */ 



char *afloat(argc,argv) int argo char «argv[J 

{p«ckf()j 

if (argc>0) print("\tDEC\t«m"^toKargvC03»)j 
r»turn(""h 

} 

char *adoubla(argc,argv) int argct char *argv£J 

I 

P«ckf()i 

if (argc>0) 

{print("\tOEC\t")» 
return<adt>lc<atoKargv[0])))5 

r»turn<""); 
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} 

char *anegf(argc,argv) int argq char aargv[) 
{packfft 

if (argc>0) print<"\tDEC\t-ltoVoi<argv[0}))} 
return(""); 

} 

char *anegd(argc,argv) int argq char *argv[l 

{ ' 

P«cKf(>, 
if <argcX» 

{printC\tDEC\t-"H 
roturrKadb»c<atoKwgv[0])>)j 

roturnC"); 

} 



char *astring(argc,argv) int argc; char *argv[J 

{auto int i,f,lc,c> 
auto char *cp; 

Ic-Oi /* location countar in STRING file */ 
f -xopen(pname,f n^tring^REAOBINARY); 

whilaU) 

{pacKfft 

c-cgatc(f); 

if(coof(f» break; 

print(".SW\tEQU\t«\n"Jc){ 

lc++; 

whila(l) 

{if (c— T) 

{c-cgetc(f)} 
Ic+n 

if (c— XT) c-'\0V 
packc(c>, 

} 

elsa 

{packc(c)j 
if (tc) braakj 
} 

c-cgatc(f); 

} 

} 

CClOM(f)t 

return("\\")j 

char *aond(argc,argv) int argc; char *argv[J 
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IpackfO; 

r#turn( w \tENO w ); 

} 

char *regnames[] {^0^r^2^3^4VA^rh 
char *aname(argc,argv) int argc, char *argv£3; 
{auto int base/rffset; 

if (argc>l) offset«atoKargv[l]h else offset*©; 
if (argc>0) base-fctoKargvfOft else bese-O; 
if (mflag) cprint("ANAME(Xd 1 1W>\n* ( bat«,off$at); 
if (base>«0) return<regnames[bas«]); 
base - -base; 
if (base >« cjndirect) 

{print(^ld>ff*et/4£^ 

goto check; 

} ■ 
else switch(base) { 

case c_auto: 

-printrid v r a bffMt/4)i 
goto check; 
case c_extdef: 

returnt^*!)"); 
case ecstatic: 

printrSTAT*ttf\offset/4H 

goto check; 
case c_param: 

print{"Xd,6>ffset/4); 

goto check; 
case cjabel: 

printr.LXd",offsetfc 

break; 
case cjnteger: 

if (offset<0 || 6ffset>3200Q) pr\m m **dT/>tt**\W 

else print("Xd^DL>ffsetH 

break; 
case c_float: 

printr«*s w ,adblc(offset)); 

break; 
case c_string: 

printrsXd^offseth 
break; 

} 

returnC*"); 
check: 

if (of f setX4> error(6025 f lmeno)i 

returnC"); 

} 

AALIGN - align location counter 
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*/ 

char •aalign<argc l argv) int argci char *arg v[} 

{ 

switch(atoi(argv[0])) { 
case ctjdouble: 
packfO; 

raturnC\tEVEN")j 

} 

returnTWh 
} 

f t****************** ************** 

AJC - emit conditional jump 

*/ 

char *ajc(argc,argv) int argcj char *argv[) 
{auto int condj 
cond-atoi(argv[0])j 
switch(cond) { 

case ccjaqO: return{"\tTZE\t»l")j 

case cc_neO: r#turnTVTNZ\t»l"); 

caseccJtO: raturn("\tTMI\tal">, 

case cc_geO: return("\tTPL\t»r>j 

case cc_gtO; return("\tTZE\t**2\n\tTPL\tel"); 

caseccJeO: return("\tTZE\tel\n\tTMI\tel ,, )i 

return("")i 

} 

char *other(arge,argv) int argcj char sargv[J 

{switch<«toi(argv[0]» { 
case 5: returnCQ"); 
case 6: returnCA")} 

J 

return("BAD")j 

} 

char *aif(argc,argv) int argc; chsr *argv[J 

{return<atei(argv[0])?"«r:"e2")5 

} ' 

/* PACK CHARACTERS INTO WORDS »/ 

packc(i) int i; 
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{ 

packb[packno++H; 
if (packno>-4) 

ptekno-Oj 

} ' 

■< 

whi)«(pMkno!"0) pacMOh 



char *aadcon(argc»argv) int argci char *«rgvQ 
{p«ckf<)$ 

r»turrK-\tZERO\t«0*)j 

} ■ 

char *az«ro<arsc«argv) M argcj char *frgv0 
{auto int 
if (argc>0) 

{i-atoi<argv[0])$ 

whiMpackno Aft i) {paekc(0)ti— ,} 
j - i/4; i -*4j 

if (j>0) prin«"\fflSS\«d\i»-4)» 
whiWi~)p«ck**0fc 

■ } 
r»turn("\\")i 

}.. \ 

char *aidn<argc^rgv) int argq char aargvCI 

{auto char *cpl,«cp2i 
static char n[7J 
auto int i,c; 

if (argc>0) 

{cpl - ftc»tora[atoi<argv(03)J 
cp2 - nj 

for(i«0ji<6|i-M.) 

{c • *cpl++j 
lf(c — »J)c-\A 
■■ *cp2+* « ej 

} 

•cp2V\0'j 
raturrKn); 

} 

riiurnOf 



} 

adblc(i) 

{auto char *cpl,*cp2j 
static char buf[30J 
auto int c.flag; 

flag-FALSE; 
cpl - &cstore[i]j 
cp2 - &buf[0]i 

while(c - *cpl++) 

{if (c — 'E') 

{flag-TRUE; 
c - 'D'j 
} 

if (cp2 < &buf[27]) 

*cp2++ - c; 

} 

if (!flag) 

{*cp2++ - 'D'; 
*cp2++ - '0'; 

} 

*cp2++ - '\0'i 

return(&buf[0])j 

} 
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Appendix VI - Overall Description of the Compiler 



The compiler consists of four major phases. First, the lexical analysis phase (CI) transforms the source 
program into a string of lexical tokens such as identifiers, constants, and operators. Second, the syntactic 
analysis phase (C2) parses the token string and produces a tree representation of each function 
(procedure) defined in the source program. Third, the code generation phase (C3) transforms the trees 
produced by the syntactic analysis phase into an intermediate language program consisting of a sequence 
of macro calls representing instructions of the particular abstract-machine defined by the implementer. 
Finally, the macro expansion phase <C4) expands the macro calls, producing an object language program 
as the output of the compiler. In addition, there is an error message editor <C5) which Is invoked fast in 
order to format eny error messages produced by the other phases. The phases of the compiler are 
invoked in sequence by the control program (CC). The control program communicates with the various 
phases by passing as arguments to an invoked phase a set of character strings representing fib names 
end an option list; the invoked phase returns a completion code which indicates whether or not any 
serious or fatal errors occurred during the execution Of that phase. The vertous phesee communicate 
with each other using intermediate files. 

The lexical and syntax analysis phases may be run sequentially as described above, or, where a system's 
program size restrictions permit, may be combined into a single phase, thus eliminating the use of an 
intermediate file. This option is implemented through the use Of compile-time cortdittonets. The rememder 
of this chapter will assume that the two phases are separate. 

1. The) Lexical Analysis Phase 

The lexical analyzer reads in the source program and breaks it into a string of tokens such as identifiers, 
constants, and operators. The lexical analyzer aiso interprets compHe-time control lines which allow one 
to include source from other files and to define manifest constants. The lexical analyzer produces output 
onto three intermediate files: the TOKEN file, which contains the string of tokens, the CSTORE file, which 
contains the source representations of identifiers and floating-point constants, and the STRING fib, which 
contains character string constants. The TOKEN file is passed to the syntax analysis phase; the CSTORE 
and STRING files are not used until macro expansion. In addition, the toxical analyzer may write error 
messages in an internal form onto the ERROR file. A token is represented by a pair of integers ceiled the 
TYPE and the INDEX of the token. The syntax analyzer performs its analysis on the basis of the token 
TYPE; thus most operators have a distinct TYPE, and there are separate TYPEs for identifiers, integer 
constants, floating-point constants, and character string constants. The INDEX is used to distinguish 
particular identifiers or constants! for example, the INDEX of an identifier is the index of the source 
representation of the identifier in the array of characters written onto the CSTORE fib. 

The main routine of the lexical analyzer consists of a loop which calls a routine GETTOK to return the 
next token in the input stream and then writes the token onto the TOKEN file. This loop slso contains 
code to interpret compile-time control lines. GETTOK obtains input characters from a routine LEX GET 
which contains the logic to switch the input between the primary source fib and "included" fibs. Except 
when processing character string constants, GETTOK translates the input characters using a translation 
table. On GCOS, this translation maps lower case into upper case, tabs into blanks, and carriage returns 
into newlines. This table would be changed when moving the compiler to e system using Other then the 
ASCII character set. GETTOK partitions the character set into the following character classes: 
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letters 


2 




3 




4. 


auotation mark (*) 


5. 


new tins 

1 19 WW III 


6. 


blank 


7. 


period (.) 


8. 


the escape character (\) 


9. 


invalid characters 


10. 


characters which are unambiguously single- 




character operators (such as *{*) 


11. 


characters which may begin a multi-character 




operator (such as *<* which may begin *<- f ) 



GETTOK uses the character class of the current input character to determine its actions in analyzing the 
input string. 

2. The Syntax Analysis Phase 

The syntax analyzer accepts as input the token string generated by the lexical analyzer and produces 
output onto three intermediate files for the code generation phase: a tree representation of each function 
defined in the source program is written onto the NODE file; a symbol table containing declarative 
information about identifiers is written onto the SYMTAB file; and information regarding specified initial 
values of variables is written onto the INIT file. 

The main routine of the syntax analysis phase is a table-driven LALR(l)*parser. The tables are generated 
by a parser-generator YACC, written by S. C. Johnson [18} The input to YACC is a BNF-like description 
of the syntax of C, augmented by action routines which are to be invoked by the parser when particular 
reductions are made. YACC analyzes the grammar and produces a set of tables written in C which are 
then compiled into the syntax analysis phase. 

The tables produced by YACC represent instructions to the parser to test the TYPE of the current input 
token, to shift the current input token onto the stack, to perform a reduction and call an action routine, or 
to report a syntax error. When a syntax error is discovered, the parser writes error messages onto the 
ERROR file which give the current state of the parse. It then attempts to recover from the error so that 
any additional syntax errors in the program can meaningfully be reported. The parser attempts a 
recovery by popping states from the stack and/or skipping input tokens in various combinations. A 
recovery attempt is considered successful if the next five input tokens are shifted without detecting a 
new syntax error. If a recovery attempt is successful, error messages are written which describe the 
recovery actions taken and parsing is continued. If a successful recovery cannot be made within a limited 
region of the input program, the parser ceases execution after writing an error message. 

The following C program illustrates the compiler's response to a syntax error, in this case unmatched 
parentheses: 

int c; 
int f(file) 

{if ((c-getc(file) !- 0) return(-l); 

return(O); 

} 

The first error message, listed below, gives the state of the parse when the syntax error was discovered, 
followed by a cursor symbol 'J, followed by the next five input tokens. The next error message indicates 
that the parser was able to recover from the error by skipping the next two input tokens. The resulting 
program, although syntactically correct, is meaningless. Therefore, in order to avoid extraneous error 
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messages, the code generation phase end the macro expansion phase are hot executed after syntax 
errors have been detected. 

3: SYNTAX ERROR. PARSE SO FAR: <ext_defjist> <functionjdcl> 
<block_head> IF ( <e> _ RETURN ( - 1 ) 
3: SKIPPED: RETURN ( 

The following program also contains a syntax error due to unmatched parentheses* however, since there 
are no more right parentheses in the ttatt^nt followU^ the |^ where tt»e error detected, the 
parser recovers from the terror by deleting tha unfinished IF clauae. 

int c; 
int f(file) 

{if «c-getc(file> -«•<>) c - -U 
return(c)*, 

} 

3: SYNTAX ERROR. PARSE SO FAR: <ext_def Jist> <functlonjdcl> 
<block_head> IF < <e> _ C - - 1 ; 
3: DELETED: IF ( <e> 

The following program is an example of a syntax error from which tlw parser couW not recover within its 
allowed limits; thus, after skipping input tokens up to this limit, the parser gives up. 

int cj 

int Wife) . 
{if «e«getc(f lie) !- 0) c * U 
else c-0t 
return{ch 

3: SYNTAX ERROR PARSE SO FAR: <extjJefJist> <functionjfcl> 
<block Jiead> IF ( «e> «. C • 1 i ELSE 
3: SKIPPED: C-l; 
4: I GIVE UP 

3. The Code Generation Phase 

The code generation phase, performs the following operations: (1) allocates storage for (determines the 
run-time locations of) variables, f2) performs type checks on operands and inserts conversion operators 
where necessary, (3) translates the tree representation of expressions into a more descriptive form with 
AMOPs, (4) performs soma machine^ndependeni c^ttfmzttions on expressions, (5) emits macro calk to 
define names which may be referenced by other programs (ENTRY symbols) and to declare names which 
are assumed to be defined in other programs (EXTRN symbols), (6) emits macro caNs to define end 
initialize variables, (7) emits macro caHs to execute the control statements of each function defined in the 
source program, and (8) emits macro calls to evaluate expressions. 

The code generation phase reads the NODE, SYMTAB, and INIT files produced by the syntax analysis 
phase and writes an intermediate language program in the form of macro calls onto two intermediate files, 
the MAC file and the HMAC file. The HMAC file contains the macro calls defining ENTRY symbols end 
EXTRN symbols which are produced last by the code generation d*mw« but which, in some systems, may 
be required to appear at the beginning of the assembly language program. The MAC file contains the 
remainder of trie intermediate language program. 

The main routine of the code generation phase consists of a call to a routine SAILOC, wNch allocate* run- 
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time storage and emits macro calls to define and initialize variables, followed by a loop which reads in the 
tree representation of a single C function from the NODE file and generates code (macro calls) for that 
function, followed by a call to a routine SDEF which emits macro calls to define ENTRY and EXTRN 
symbols. 

The generation of code for a C function begins with a call to a routine FHE AD with the name of the 
function as an argument, FHEAD emits a PROLOG macro call which defines the entry point and produces 
code to set up the proper run-time environment. FHEAD then allocates storage in the run-time stack 
frame for the automatic variables of the function; storage is allocated for automatic variables in order of 
decreasing alignment requirement so that no space is wasted in the stack frame. The stack frame is 
assumed to be aligned according to the strictest of the alignment requirements of the various C data 
types (usually that of double-precision floating-point). A save area of the size specified in the machine 
description is reserved at the beginning of the stack frame. 

The call to FHEAD is followed by a call to the routine STMT to generate code for the compound statement 
which is the body of the C function. The generation of code for the body of a C function occurs on two 
levels, the statement level and the expression level. The generation of code for statements is handled by 
the routine STMT which takes one argument, a pointer to a subtree representing a C statement. STMT is 
actually a very short routine which makes recursive calls to itself for the branches of a STATEMENTJ.IST 
node and calls a larger routine ASTMT if the specified node is an actual statement (as opposed to a 
statement list). The purpose of splitting code generation for statements into the two routines STMT and 
ASTMT is to minimize the amount of stack space used while recursively descending the statement tree. 

Following the call to STMT to generate code for the body of the C function, the size of the stack frame is 
adjusted to be a multiple of the stack alignment and an EPILOG macro call is emitted. On the HIS-6000, 
the EPILOG macro defines an assembly-language symbol whose value is the stack frame size; this symbol 
is referred to by the code produced by the PROLOG macro which allocates the stack frame. 

4. The Macro Expansion Phase 

The macro expansion phase expands the macro calls on the HMAC and MAC intermediate files using the 
information on the CSTORE and STRING intermediate files and places the result of that expansion on the 
output file. The macro expander is not a general-purpose macro processor; in particular, there are no 
built-in macro calls for defining macros or for handling local or global variables. Furthermore, the total 
number of characters (after any macro expansion) in the argument list of a macro call is limited to 100. 
The maximum allowed depth of nested macro calls is 10. 

The macro expander processes a stream of characters terminated by a NULL character. Within this 
stream of characters, the characters TP, V, and *V have special significance. The TP character indicates 
the beginning of a macro call, which consists of the TP, followed by the name of the macro, followed by a 
(possibly null) list of character string arguments separated by commas and enclosed in parentheses. The 
'#* character is used within the body of a macro definition to refer to the arguments of the macro call; the 
character sequences '#0' through '*9' refer to arguments 0 through 9, respectively. The *V character is 
an escape character. The special interpretation of a character such as TP, V, T or V is inhibited when 
that character is preceded by a *\\ In addition, the character sequences '\t\ *\n*, *\r* are used to 
represent tab, newline, and carriage-return, respectively. A '\* character followed by a newline character 
results in both characters being ignored; thus a macro which expands to a backslash will swallow the 
newline which followed the macro call in the input file. (A macro call in the input file which expands to 
the null string will leave a blank line in the compiler output; this is generally a sign that the implementer 
has not completely specified the macro definition for an AMOP.) The backslash character itself is 
represented by *\\\ 

The normal operation of the macro expander consists of copying characters directly from the input stream 
to the output stream. When a TP is encountered, the name of the macro and the arguments of the macro 
call are evaluated and collected in a buffer; this evaluation may itself involve the processing of embedded 
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macro calls. The input stream is then switched to the body of the macro definition and normal processing 
is resumed. When a V is encountered, the argument number is read and the input stream is switched, to 
the corresponding character string argument of the current macro call, which is stored in the associated 
buffer. Normal processing is then resumed. The input stream operates in a stack-tike manner in that 
when the end of a macro definition or an argument string is reached, the input stream is restored to its 
previous state. When end of file is reached on the HMAC fK tha input stream is switched to the MAC 
file; when end of file is reached on the MAC file, macro nxpansioh is terminated. 

There are three types of macros which are handled by the macro expander, first, there are the macros 
representing three-address abstract machine Instructions, which art produced by the code generator 
while processing expressions. These macros art defined orrfy in tr* m^Wne o^«rtptionj 9* macro calls 
are of a special form which directly specifies the intorna) r*m>ber of trw corresponding macro definition, 
as assigned by GT. For example, the macro call «3 refers to macro definition number 3. Second, there 
are the keyword macros which are produced by t h* c^ function, definitions 

end statements. These macro* may be defined either in the machine description or by C routines; the 
macro calls specify the macro names as tfveh in Appendix Ijt. Finally, Wferjb are the macros which are 
created toy the implementer and used wjthlh other n»ecro <MWtiom. These macros may be defined either 
in the machine description or by C routine* the macro celts specify the macro name as defined by the 
implementer. 

A macro which is defined in the machine description is specified as a list of one or more character string 
constants, possibly with associated location prefixes for conditional expansion. Such a macro definition is 
implemented as a list Of pointer* the cheeacter, string cowtanj^ along with associated integers 
representing the condUions specified in the location prefixes, if any. The Ijsts are accessed through an 
array MACOEF, produced by GT, which is indexed ,by jSirrtarnaTm^ assigned ^by 

GT to each macro definition in the machine description. As mentioned above, a macro caJ represenUng a 
three-address abstract machine instruction directly specifies the macro definition number. Other macros 
defined in the mechine description are represented in a table produced by CT whkfc associate* the macro 
names with the corresponding macro definition numbers. 

Macros defined by C routines are represented in a tsble provided by the implementer which associates 
the macro names with the corresponding C functions. This table consists of an array FN of pointers to 
the character string macro names, an array FF of pointers to the corresponding C functions, and an 
integer NFN specifying the number of entries in th* table. It would ba mor a c«weniint tor jht 
implementer to specify the G macro definitions in the machine (fepiptta jr4 M ffr WW&Wt' NPN, fH 
and FF; however, this was not done because of the lexical difficulties associated with including C source in 
the machine description. 

The macro expander is implemented as two levels of get-charactar routines. The lower level routine, 
GETC1, returns the next character from the current input source which may be either the input file 
(HMAC or MAC intermediate file) or a character string in memory. If It is a character string, it may be 
part of a definition of a macro specified in the mechine description, an argument of tha currant macro call, 
or the result returned by a C routine macro definition. Tha current state of tha input stream is kept fn a 
stack of structures ceiled input control blocks (iCBsV GETC1 uses the top ICB on the stack to determine 
the source of the next character. Tha members of an ICB structure ar* Mad Wfr* *ftn > ^btr meanings: 
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F 



a flag indicating the type of the current input source {the input file, a macro 
defined in the machine description, or a character string) 



LOCP 



if the current input source is a macro defined in the machine description, this is a 
pointer to the current position in the list containing the pointers to the character 
strings which make up the macro definition 



CP 



if the current input source is not the input file, this is a pointer to the next 
character in the current character string 



ARGV[10] an array of pointers to the character string arguments of the current macro call 

BASE[3] the REF.BASEs of the result, the first operand, and the second operand of the 
current macro call, used when computing conditional expansion 

A NULL character indicates the end of a character string or end-of-file on an input file; thus if the current 
input character is NULL, GETC1 updates the current state of the input stream by advancing LOCP or by 
popping an ICB off the stack or by switching the input file from the HMAC to the MAC intermediate file. 
GETC1 returns the NULL character only upon end-of-file on the MAC intermediate file. 

The higher level get-character routine is MGET, which implements the V, Tf\ and *Y conventions. MGET 
begins by calling GETC1 to obtain a character. If the character returned is a backslash, then GETC1 is 
called again to obtain the second character of the escape sequence and the appropriate action is taken: 
If the escape sequence is *\t\ '\n\ or >\ then the character is taken to be tab, newline, or carriage 
return, respectively. If the second character is a newline, then it is ignored, and MGET returns the result 
of a recursive call to itself. Otherwise, the second character is returned as the value of MGET (thus it is 
protected from special interpretation). 

If the resulting character is not a V or a T, then MGET returns that character directly. A V followed 
by a digit results in pushing a new ICB onto the stack pointing to the appropriate character string 
argument of the current macro call. A V followed by 1 0\ T\ '$\ or 'R' (see Appendix I, section 3) results 
in a call to the C routine ANAME (which implements the NAME macro) with the appropriate arguments. 
When a TP is encountered, the macro name is collected and the arguments are assembled into a 100- 
character buffer. The macro name and the arguments are obtained by recursive calls to MGET so that 
embedded macro calls are expanded; the result of expanding an embedded macro call may include commas 
or right parentheses without interfering with the argument structure of the macro call being processed. 
If the macro name is an integer, the correspondingly numbered macro definition from the machine 
description is used; otherwise, the macro name is looked up in a hash table containing the names of all 
defined macro names. If the macro is defined in the machine description, a new ICB is pushed onto the 
stack with LOCP pointing to the beginning of the list of pointers to character strings which represents the 
macro definition. Otherwise, if the macro is defined by a C routine, the C function is called and an ICB is 
pushed onto the stack which points to the character string returned by that function; thus references to 
arguments and embedded macro calls in the string returned by the C function are processed. MGET then 
resumes normal operation by calling GETC1. Note that the effect of a call to an undefined macro is to 
replace the macro call by the null string; no error messages are produced by the macro expander. 

The main routine of the macro expander consists of initialization, including the setting up of the hash 
table, followed by a loop which calls MGET repeatedly and writes the returned character onto the output 
file; this loop terminates when the returned character is NULL 

5. The Error Message Editor 

The error message editor is invoked as the last phase of the compiler to read from the ERROR 
intermediate file the error records written by the previous phases and to print error messages 
corresponding to those error records. The error message editor allows variable data, such as identifier 
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names, to be included in the printed messages. In addition, error messes of arbitrary length can be 
constructed from a sequence of error records! the error message editor automatically breaks long output 
lines so that all output lines fit within a fixed P*gP width. 

An error record is a structure containing seven integers: an error number, a line number, and five 
arguments. The error number selects a basic error message string which contains the fixed text of the 
error message and optional indicators for including variable data. An indicator is a two-character 
sequence beginning with p Y 5 the character following the * defines tt« interpreted of the variable 
data which will replace the indicator when th» string is printed. The variable data Is specified by on. or 
more of the •rgurflents in the orr^ to 
right; arguments are used as needed according to the intarpretatic^ specified by the indicators. The 
various indicators are listed below with their interpretations: 

Xd print the next argument as a decimal integer 

Xm print the string in the internal compiler table CSTORE which begins at the index 
specified by the next argument 

Xn print a string representing a node (operator) of the internal representation produced by 
the syntax analysis phase, as specified by the next argument 

Xq print a string representing the tari«inal or r^terminai symb<^ associated with the 
parser state specified by the next argument 



Xt 



print the source representation of the token whose TYPE and INDEX are specified by 
the next two erguments 



XX print a TT 

Only the arguments which are referenced by the basic error message string are specified when an error 
record is writtem the values pf the remaining srgumer4s in tr» recwd ere undefined 

The line number field in the error record associates a line in the source program with the error which 
tZSi??* »rror record. If a line numb*- i« given (UNTO? <%i 4|**inied out on a new line, 

followed by e colon, foltoweoVby the Jext shifted by tr« error r^d; otherwise (LINENO <- 0), the text 
specified by the error raco^dis w Thus eh errv W ag* cpmiet. of en initial 

error record containing a Irne number followed by *ero or rnore errw recp^ witrxwt line mwOjers. In 

from the following basic error message strings: 

^SYNTAX ERROR. PARSE SO FAR: " 

"Xq" (for eech state on the parser stacK) 

(represents the input cursor) 
"Xt" (for each Of the next 5 input tokens) 

The syntax ^nalysis phase cen produce these error messages without counting the symbols in the 
message or Knowing their lengths because the error message editor takes care of breaking long output 

In eddition to selecting a basic error message string, an error number represents the severity level of 
the corresponding error: 



^^^^ Tt— 
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•rror number severity 

1000 - 1999 error 

2000-3999 serious error 

* 404» -5999 fetal error ; , . 

6000 - 6999 compter error 

A fatal •rrbr or a compiler error will terminate the currant phase, andW . r^n ii * . phpff • (•xealpi 

•rror message editor) will be invoked; in addition, »c*«|$Qr^i^^ 

the string ■■vv 

COMPILER ERROR." 

A s»riou» error aUcw* foe current p^ t*^ thf wror 

massage alitor) are skipped ^: ; ,"/,^-.';,^,; 

The error message editor writes its output onto the standard output u^:wr)^ k normally ^ yier's 
terminal in a time-sharing system or a line printer in a batch system. However, when the compiler ts 
submitted as a batch job by a time-sharing user, this output is redirected onto an error listing file. This 
is accomplished by passing the argument "»tal a to tha error message editor which indicates that output 
to the standard output unit is to be appended onto fitacode EL (the error Hating file). Redirection of 
standard input and output is a (not necesaertly portable* featuro of the C run-time syetem, rathar than of 
tha compiler itself. 

6. Invoking the Compiler Phases 

The mechanisms for invoking a phase of the compiler, passing arguments to it, and returning e completion 
code are operating system dependent. In general, the control pV^am ^ each system 

on which the compiler runs} on some systems, the control program may be replaced by a set of job 
control cards (see Figure 1 on page 31). The source of the compiler phases need not be changed, 
however; the operating system dependencies associated with tha invocation of a C program are Isolated 
in two run-time routines, the startup routine and tha exit routine. The et*tup routine receives control 
from the operating system, establishes the C run-time eiwironment, and calls the C routine named MAIN. 
It is the responsibility of the startup routine to take the character string arguments, which may be 
provided by the operating system or written on a temporary file, and arrange them as an array of 
character strings which is then passed as an argument to MAIN. The «dt routine EXIT ia catted upon a 
return from MAIN; it may aiso be called directly by a C program. The exit routine closes ell open files 
and returns control to the operating system. EXIT has one optional argument, a return code, which it 
communicates to the control program as a completion or abort code or by writing it onto a temporery file. 

On UNIX, a phase of the compiler is invoked by calling the system routine FORK, which create* a r»w 
process, followed by a call in the new process to the system routine EXfCL, which overwrites the process 
with the desired phase of the compiler .and passes it a list of character strings as arguments. Trie old 
process waits for the execution of the compiler phase to finish by calling tha system routine WAIT, which 
waits for the process to die and returns its completion code. 

On GCOS, two methods are used to invoke a phase of the compiler from the control program, which runs 
in time-sharing. The first method uses a routine SYSTEM, a C-call*We Interface to the system call CAUSS 
which can invoke any time-sharing subsystem (programX The character string arguments are passed tn 
the system teletype buffer (using the system call PSEUDO) ao that to the invoked program it appears that 
it was invoked by a command typed at command levei with those arguments. The completion code is 
stored (using the system call CORFU) in the first word of the core file, a ten word buffer provided by the 
operating system for communication between a user's subsystems. The disadvantage Of running the 
compiler phases in time-sharing is that the compiler phases, being large programs, can take a very large 
elapsed time to run. Thus this method is used only for the error message editor wNch printa error 
messages on the user's terminal. 
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The second method uses » routine TASK, a C-cailable interfece to the TASK system cell, to submit e 
program as a special, Wgh-prtority batch activity. The elapsed time for a TASK activity is. typically much 
tower then for the same program run in timesharing. The character stripe arguments are written onto a 
temporary file which is reed by the startup routine when in batch, im^fkn cod* is handled as 
follows: tf there is no argument to EXIT or the argument is 0, EXIT terminates nbrmetty aod TASK will 
return a status code of 0. Other wise, £XIT aborts with the completion code as the abort c^dej the abort 
code is then returned in the status code by TASK. 

The compiler phases can also be invofcod as normal GCOS batch activities by the sequence of control 
cards shown in Figure 1. When these cards are submitted, I0EHT end USERID cards are inserted at the 
beginning of the deck and the characters V and T are replaced by the user's identification and the besic 
component of the source fit* name, respectively, Thus if the user is TP end the source f He is •B/TtST.C*, 

W*!^^ thp flip 3mW mt.Sm.«m <mu&* will be 

written onto the file •tfiwrr. The generation of the control cards aft* the submission of thTb#tch job 
is performed by a time-sharing program (commend* A* the turnaround Km* for a norm* batch job can 
be quite lo^, this version of the compter is used only for those programs which art too large to compile 
Wing the other version of the compiler. ^ ■ "~ w - 



